This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
This document provides an introduction and overview of Azure Data Lake. It describes Azure Data Lake as a single store of all data ranging from raw to processed that can be used for reporting, analytics and machine learning. It discusses key Azure Data Lake components like Data Lake Store, Data Lake Analytics, HDInsight and the U-SQL language. It compares Data Lakes to data warehouses and explains how Azure Data Lake Store, Analytics and U-SQL process and transform data at scale.
The document discusses Azure Data Factory V2 data flows. It will provide an introduction to Azure Data Factory, discuss data flows, and have attendees build a simple data flow to demonstrate how they work. The speaker will introduce Azure Data Factory and data flows, explain concepts like pipelines, linked services, and data flows, and guide a hands-on demo where attendees build a data flow to join customer data to postal district data to add matching postal towns.
This document discusses the evolution of data warehousing and the modern data platform. It outlines some common problems with traditional data warehousing approaches like long setup times, poor performance and scalability issues. The modern data platform combines cloud-based data warehousing, data modeling principles, and data warehouse automation tools to provide highly scalable and agile solutions. Key components demonstrated are the Snowflake data platform for scalable data storage and processing, Fivetran for automated data integration, and capabilities like cloning data for testing and time travel to access historical data.
The document discusses Azure Data Factory v2. It provides an agenda that includes topics like triggers, control flow, and executing SSIS packages in ADFv2. It then introduces the speaker, Stefan Kirner, who has over 15 years of experience with Microsoft BI tools. The rest of the document consists of slides on ADFv2 topics like the pipeline model, triggers, activities, integration runtimes, scaling SSIS packages, and notes from the field on using SSIS packages in ADFv2.
Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform optimized for Azure. Designed in collaboration with the founders of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. As an Azure service, customers automatically benefit from the native integration with other Azure services such as Power BI, SQL Data Warehouse, and Cosmos DB, as well as from enterprise-grade Azure security, including Active Directory integration, compliance, and enterprise-grade SLAs.
Big data architectures and the data lakeJames Serra
The document provides an overview of big data architectures and the data lake concept. It discusses why organizations are adopting data lakes to handle increasing data volumes and varieties. The key aspects covered include:
- Defining top-down and bottom-up approaches to data management
- Explaining what a data lake is and how Hadoop can function as the data lake
- Describing how a modern data warehouse combines features of a traditional data warehouse and data lake
- Discussing how federated querying allows data to be accessed across multiple sources
- Highlighting benefits of implementing big data solutions in the cloud
- Comparing shared-nothing, massively parallel processing (MPP) architectures to symmetric multi-processing (
Azure data analytics platform - A reference architecture Rajesh Kumar
This document provides an overview of Azure data analytics architecture using the Lambda architecture pattern. It covers Azure data and services, including ingestion, storage, processing, analysis and interaction services. It provides a brief overview of the Lambda architecture including the batch layer for pre-computed views, speed layer for real-time views, and serving layer. It also discusses Azure data distribution, SQL Data Warehouse architecture and design best practices, and data modeling guidance.
Azure Data Factory is a data integration service that allows for data movement and transformation between both on-premises and cloud data stores. It uses datasets to represent data structures, activities to define actions on data with pipelines grouping related activities, and linked services to connect to external resources. Key concepts include datasets representing input/output data, activities performing actions like copy, and pipelines logically grouping activities.
This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.
Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. In this session we will learn how to create data integration solutions using the Data Factory service and ingest data from various data stores, transform/process the data, and publish the result data to the data stores.
Big data is driving transformative changes in traditional data warehousing. Traditional ETL processes and highly structured data schemas are being replaced with schema flexibility to handle all types of data from diverse sources. This allows for real-time experimentation and analysis beyond just operational reporting. Microsoft is applying lessons from its own big data journey to help customers by providing a comprehensive set of Apache big data tools in Azure along with intelligence and analytics services to gain insights from diverse data sources.
Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
1- Introduction of Azure data factory.pptxBRIJESH KUMAR
Azure Data Factory is a cloud-based data integration service that allows users to easily construct extract, transform, load (ETL) and extract, load, transform (ELT) processes without code. It offers job scheduling, security for data in transit, integration with source control for continuous delivery, and scalability for large data volumes. The document demonstrates how to create an Azure Data Factory from the Azure portal.
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
Big Data Analytics in the Cloud using Microsoft Azure services was discussed. Key points included:
1) Azure provides tools for collecting, processing, analyzing and visualizing big data including Azure Data Lake, HDInsight, Data Factory, Machine Learning, and Power BI. These services can be used to build solutions for common big data use cases and architectures.
2) U-SQL is a language for preparing, transforming and analyzing data that allows users to focus on the what rather than the how of problems. It uses SQL and C# and can operate on structured and unstructured data.
3) Visual Studio provides an integrated environment for authoring, debugging, and monitoring U-SQL scripts and jobs. This allows
First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
Spark as a Service with Azure DatabricksLace Lofranco
Presented at: Global Azure Bootcamp (Melbourne)
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we will go through Azure Databricks key collaboration features, cluster management, and tight data integration with Azure data sources. We’ll also walk through an end-to-end Recommendation System Data Pipeline built using Spark on Azure Databricks.
Azure Cosmos DB is Microsoft's globally distributed, multi-model database service that supports multiple APIs such as SQL, Cassandra, MongoDB, Gremlin and Azure Table. It allows storing entities with automatic partitioning and provides automatic online backups every 4 hours with the latest 2 backups stored. The Azure Cosmos DB change feed and Data Migration Tool allow importing and exporting data for backups. An emulator is also available for trying Cosmos DB locally without an Azure account.
The world of business intelligence and analytics has changed from one that IT was providing the information in an organisation to more self-service data analytics that the end user has the ability to access and consume the data from a platform such as a data warehouse, as well as being able to enhance their data analytics with other data sources. In addition, the users now can easily share content and collaborate with other users. This has enabled businesses to leverage from making better decisions in more agile way. However, giving the users a lot of freedom in accessing and sharing data without any governance can expose the businesses to serious security and privacy risks.
In this session, we discussed what governance means when it comes to Power BI and how to implement an organisation-wide governance framework for Power BI ecosystem, without preventing the business to naturally grow in its analytics capability a long with some real-world examples and good practices.
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Build Real-Time Applications with Databricks StreamingDatabricks
This document discusses using Databricks, Spark, and Power BI for real-time data streaming. It describes a use case of a fire department needing real-time reporting of equipment locations, personnel statuses, and active incidents. The solution involves ingesting event data using Azure Event Hubs, processing the stream using Databricks and Spark Structured Streaming, storing the results in Delta Lake, and visualizing the data in Power BI dashboards. It then demonstrates the architecture by walking through creating Delta tables, streaming from Event Hubs to Delta Lake, and running a sample event simulator.
Suhail Jamaldeen is a Microsoft consultant and trainer who specializes in Office 365 and Azure. He discusses key topics related to cloud computing including the characteristics, models, and services. Microsoft Azure is introduced as a cloud platform that allows users to build, deploy, and manage applications across global data centers. [/SUMMARY]
This document provides an overview of Azure Databricks, including:
- Azure Databricks is an Apache Spark-based analytics platform optimized for Microsoft Azure cloud services. It includes Spark SQL, streaming, machine learning libraries, and integrates fully with Azure services.
- Clusters in Azure Databricks provide a unified platform for various analytics use cases. The workspace stores notebooks, libraries, dashboards, and folders. Notebooks provide a code environment with visualizations. Jobs and alerts can run and notify on notebooks.
- The Databricks File System (DBFS) stores files in Azure Blob storage in a distributed file system accessible from notebooks. Business intelligence tools can connect to Databricks clusters via JDBC
Azure DevDays - Business benefits of native cloud applicationslofbergfredrik
Why software developers should build cloud native applications from a business point of view. How Microsoft Azure supports business and consumer application development. How easy prototyping makes is easier for business to know what they actually want and what they are getting. Importance of WOW-level UX in cloud applications and the ready Azure infrastructure for handleing authentication for business applications.
In June 2017 at the Devops Enterprise Summit in London, while announcing the 2017 State of Devops Report with his esteemed colleagues, Jez Humble reveled that their studies showed that there was a strong correlation between high-functioning teams and the architecture of the software they are building, deploying and managing. In short - architecture matters to Devops.
In this talk Cornelia goes over a host of software architectural patterns and their relationship to some of the key goals of Devops - "higher throughput and higher quality and stability." Cloud native applications and cloud native data are both covered.
This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.
Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. In this session we will learn how to create data integration solutions using the Data Factory service and ingest data from various data stores, transform/process the data, and publish the result data to the data stores.
Big data is driving transformative changes in traditional data warehousing. Traditional ETL processes and highly structured data schemas are being replaced with schema flexibility to handle all types of data from diverse sources. This allows for real-time experimentation and analysis beyond just operational reporting. Microsoft is applying lessons from its own big data journey to help customers by providing a comprehensive set of Apache big data tools in Azure along with intelligence and analytics services to gain insights from diverse data sources.
Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
1- Introduction of Azure data factory.pptxBRIJESH KUMAR
Azure Data Factory is a cloud-based data integration service that allows users to easily construct extract, transform, load (ETL) and extract, load, transform (ELT) processes without code. It offers job scheduling, security for data in transit, integration with source control for continuous delivery, and scalability for large data volumes. The document demonstrates how to create an Azure Data Factory from the Azure portal.
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
Big Data Analytics in the Cloud using Microsoft Azure services was discussed. Key points included:
1) Azure provides tools for collecting, processing, analyzing and visualizing big data including Azure Data Lake, HDInsight, Data Factory, Machine Learning, and Power BI. These services can be used to build solutions for common big data use cases and architectures.
2) U-SQL is a language for preparing, transforming and analyzing data that allows users to focus on the what rather than the how of problems. It uses SQL and C# and can operate on structured and unstructured data.
3) Visual Studio provides an integrated environment for authoring, debugging, and monitoring U-SQL scripts and jobs. This allows
First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.
The document discusses modern data architectures. It presents conceptual models for data ingestion, storage, processing, and insights/actions. It compares traditional vs modern architectures. The modern architecture uses a data lake for storage and allows for on-demand analysis. It provides an example of how this could be implemented on Microsoft Azure using services like Azure Data Lake Storage, Azure Data Bricks, and Azure Data Warehouse. It also outlines common data management functions such as data governance, architecture, development, operations, and security.
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
Spark as a Service with Azure DatabricksLace Lofranco
Presented at: Global Azure Bootcamp (Melbourne)
Participants will get a deep dive into one of Azure’s newest offering: Azure Databricks, a fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure. In this session, we will go through Azure Databricks key collaboration features, cluster management, and tight data integration with Azure data sources. We’ll also walk through an end-to-end Recommendation System Data Pipeline built using Spark on Azure Databricks.
Azure Cosmos DB is Microsoft's globally distributed, multi-model database service that supports multiple APIs such as SQL, Cassandra, MongoDB, Gremlin and Azure Table. It allows storing entities with automatic partitioning and provides automatic online backups every 4 hours with the latest 2 backups stored. The Azure Cosmos DB change feed and Data Migration Tool allow importing and exporting data for backups. An emulator is also available for trying Cosmos DB locally without an Azure account.
The world of business intelligence and analytics has changed from one that IT was providing the information in an organisation to more self-service data analytics that the end user has the ability to access and consume the data from a platform such as a data warehouse, as well as being able to enhance their data analytics with other data sources. In addition, the users now can easily share content and collaborate with other users. This has enabled businesses to leverage from making better decisions in more agile way. However, giving the users a lot of freedom in accessing and sharing data without any governance can expose the businesses to serious security and privacy risks.
In this session, we discussed what governance means when it comes to Power BI and how to implement an organisation-wide governance framework for Power BI ecosystem, without preventing the business to naturally grow in its analytics capability a long with some real-world examples and good practices.
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Build Real-Time Applications with Databricks StreamingDatabricks
This document discusses using Databricks, Spark, and Power BI for real-time data streaming. It describes a use case of a fire department needing real-time reporting of equipment locations, personnel statuses, and active incidents. The solution involves ingesting event data using Azure Event Hubs, processing the stream using Databricks and Spark Structured Streaming, storing the results in Delta Lake, and visualizing the data in Power BI dashboards. It then demonstrates the architecture by walking through creating Delta tables, streaming from Event Hubs to Delta Lake, and running a sample event simulator.
Suhail Jamaldeen is a Microsoft consultant and trainer who specializes in Office 365 and Azure. He discusses key topics related to cloud computing including the characteristics, models, and services. Microsoft Azure is introduced as a cloud platform that allows users to build, deploy, and manage applications across global data centers. [/SUMMARY]
This document provides an overview of Azure Databricks, including:
- Azure Databricks is an Apache Spark-based analytics platform optimized for Microsoft Azure cloud services. It includes Spark SQL, streaming, machine learning libraries, and integrates fully with Azure services.
- Clusters in Azure Databricks provide a unified platform for various analytics use cases. The workspace stores notebooks, libraries, dashboards, and folders. Notebooks provide a code environment with visualizations. Jobs and alerts can run and notify on notebooks.
- The Databricks File System (DBFS) stores files in Azure Blob storage in a distributed file system accessible from notebooks. Business intelligence tools can connect to Databricks clusters via JDBC
Azure DevDays - Business benefits of native cloud applicationslofbergfredrik
Why software developers should build cloud native applications from a business point of view. How Microsoft Azure supports business and consumer application development. How easy prototyping makes is easier for business to know what they actually want and what they are getting. Importance of WOW-level UX in cloud applications and the ready Azure infrastructure for handleing authentication for business applications.
In June 2017 at the Devops Enterprise Summit in London, while announcing the 2017 State of Devops Report with his esteemed colleagues, Jez Humble reveled that their studies showed that there was a strong correlation between high-functioning teams and the architecture of the software they are building, deploying and managing. In short - architecture matters to Devops.
In this talk Cornelia goes over a host of software architectural patterns and their relationship to some of the key goals of Devops - "higher throughput and higher quality and stability." Cloud native applications and cloud native data are both covered.
Where SOA and Monolitch EAR have failed. It's not simple to have your Apps scaling automagically without a very complex architecture. We're going to show pros and cons of so called Cloud-Native Applications based on Microservices, Caas, DevOps, Continuous Delivery....
Infinite power at your fingertips with Microsoft Azure Cloud & ActiveEonActiveeon
Joint talk Microsoft-ActiveEon at Cloud Expo Europe, Big Data Analytics and Cloud management theater. Presenters: Christopher Plieger, Microsoft Azure Product Marketing Manager, and Denis Caromel , CEO - ActiveEon
The document discusses getting started with cloud native development and provides an overview of Oracle's cloud platform for application development, which supports building modern cloud-native applications using technologies like microservices, containers, and mobile development tools, and allows developers to test and deploy applications in the cloud with services for continuous delivery, scaling, and monitoring. It also highlights Oracle's developer automation, Java, and container cloud services that help developers build, deploy, and manage applications in a cloud environment.
Building scalable cloud-native applications (Sam Vanhoutte at Codit Azure Paa...Codit
This document discusses learnings from building scalable cloud-native solutions. It covers considerations for scalability like decoupling services, partitioning data and throttling external communications. Specific patterns for communication between services are examined like asynchronous messaging for durability and load leveling. Testing is emphasized to ensure solutions can scale out. Designing for changes in the cloud over time with new services and features is also advised.
This document discusses cloud-native data and patterns for managing data in microservices architectures. It describes using data services and APIs to interface with existing data sources. Patterns like caching data at the edge with various caching strategies are discussed. The document also covers using multiple small databases with each microservice rather than a shared database. Event sourcing and CQRS patterns are presented as ways to integrate data across services. Finally, the impact on roles like database administrators is considered in cloud-native data environments.
1) The document describes an Azure Resource Manager (ARM) template for deploying OpenShift Enterprise on Azure. It provisions masters, infra nodes, and worker nodes with load balancing and storage.
2) The ARM template automates the entire deployment process through nested templates for each resource and Bash scripts for configuration. It handles naming, load balancing, storage, networking, and more.
3) The goal is to create a production-ready reference architecture for OpenShift on Azure and automate the deployment process through the ARM template. Current work focuses on deployment, storage, authentication, and documentation. Future work includes additional features and integrations.
Agile Development and DevOps in the Oracle Cloudjeckels
A broad overview of how Oracle is delivering on the latest generation of development tools and frameworks to help modern enterprises succeed. From Oracle Open World 2016, all rights reserved.
The Application Server Platform of the Future - Container & Cloud Native and ...Lucas Jellema
New architecture patterns are rapidly influencing many organizations. The march to the cloud is taking place. DevOps and microservices for true agility and containers as vehicle for delivery, testing and management. During
Oracle OpenWorld 2017 - Oracle presented its vision and roadmap in the area of cloud native computing (which is based on container native) and announced its application server platform (container management runtime) of the future. This presentation summarizes that picture painted by Oracle.
To really take advantage of cloud, software must be optimized to run in the cloud. This presentation explores what it means to be "Cloud Native" and looks at a real open source project that has built a complete Cloud Native platform. Cloud is not just a better way to run existing software, there are core enhancements that need to be made to software to enable it to run really effectively in a cloud environment. Often the first thought is about massive scalability, but actually there are other key enablers: multi-tenancy, metering, dynamic distribution, self-service and incremental deployment and testability. This presentation explores these enablers and looks at how an Open Source project (Carbon) built on Apache technology was re-built to be cloud native. The presentation will cover not just the concepts but dive into the practical issues in making a cloud native system and also explore which Apache technologies can help along the way.
Slides given at Agile 2015 to support talk with Josh Long
Walks through basic ideas of Cloud Foundry BOSH, Cloud Foundry Elastic Runtime and Spring Boot/Spring Cloud.
Covered these slides in ~20 minutes, then did 50 minutes of Lattice demos and Spring live coding.
Landscape Cloud-Native Roadshow Los AngelesVMware Tanzu
This document contains the slides from a Cloud-Native Roadshow presentation by Pivotal on cloud-native architectures, platforms, processes and culture. Some key points discussed include:
- The definition and principles of cloud-native including microservices, automation, collaboration and structured platforms
- Architectural approaches like test-driven development, continuous delivery and infrastructure as code
- Cultural aspects like DevOps, CALMS, SRE and CRE
- Pivotal Cloud Foundry as a cloud-native platform providing runtime, services and integration
- Observability, SLIs, SLOs and error budgets for reliability
- The evolution of cloud-native and principles like the CAP theorem
The ability to deliver software is no longer a differentiator. In fact, it is a basic requirement for survival. Companies that embrace cloud native patterns of software delivery will survive; companies that don’t - will not.
In this webinar, we will:
- Look at the common patterns that distinguish cloud native companies and the architectures that they employ.
- Discover that an opinionated platform, one that stretches from the infrastructure all the way to the application framework, rather than ad-hoc automation, is an essential component to an enterprise's cloud native journey.
- Show that the combination of Pivotal Cloud Foundry and Spring is the complete cloud native platform.
Speaker:
Faiz Parkar
DIRECTOR OF PRODUCT MARKETING
As Director of Product Marketing for Pivotal in the Europe, Middle East and Africa region, Faiz Parkar loves working at the intersection of cloud native platforms, big data/analytics and agile application development to help organisations deliver compelling data-driven software experiences for their customers. With more than 25 years experience in the IT industry, Faiz has helped organisations large and small to take advantage of technology transitions from proprietary systems to client/server, from physical infrastructure to virtual, and from virtual infrastructure to cloud. His mission now is to help organisations accelerate their digital transformation journey and reinvent themselves as the digital leaders of the future.
Microservices + Oracle: A Bright FutureKelly Goetsch
The document provides an overview of a presentation on microservices and Oracle products. It begins with copyright information and a safe harbor statement. It then outlines the agenda, which includes an introduction to microservices, their history, prerequisites for adopting them, how to implement them, and how Oracle products support them. The document discusses how microservices decompose monolithic applications into independent, self-contained services.
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Kai Wähner
Talk from JavaOne 2017: Apache Kafka + Kafka Streams for Scalable, Mission Critical Deep Learning.
Intelligent real time applications are a game changer in any industry. Deep Learning is one of the hottest buzzwords in this area. New technologies like GPUs combined with elastic cloud infrastructure enable the sophisticated usage of artificial neural networks to add business value in real world scenarios. Tech giants use it e.g. for image recognition and speech translation. This session discusses some real-world scenarios from different industries to explain when and how traditional companies can leverage deep learning in real time applications.
This session shows how to deploy Deep Learning models into real time applications to do predictions on new events. Apache Kafka will be used to execute analytic models in a highly scalable and performant way.
The first part introduces the use cases and concepts behind Deep Learning. It discusses how to build Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Autoencoders leveraging open source frameworks like TensorFlow, DeepLearning4J or H2O.
The second part shows how to deploy the built analytic models to real time applications leveraging Apache Kafka as streaming platform and Apache Kafka’s Streams API to embed the intelligent business logic into any external application or microservice.
Some further material around Apache Kafka and Machine Learning:
- Blog Post: How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka: https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/
- Video: Build and Deploy Analytic Models with H2O.ai and Apache Kafka: https://www.youtube.com/watch?v=-q7CyIExBKM&feature=youtu.be
- Code: Github Examples using Apache Kafka, TensorFlow, H2O, DeepLearning4J: https://github.com/kaiwaehner/kafka-streams-machine-learning-examples
The document discusses leveraging the cloud to architect digital solutions. It covers state-of-the-art IoT technology, machine learning clustering and classification prototypes, Cortana analytics, and patterns and anti-patterns for building solutions. The document demonstrates table storage and machine learning clustering of data. It presents an Azure IoT reference architecture and discusses visualizing machine learning results and deriving business value from big data.
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
The document discusses evolving approaches to data warehousing and analytics using Azure Data Factory and Azure Stream Analytics. It provides an example scenario of analyzing game usage logs to create a customer profiling view. Azure Data Factory is presented as a way to build data integration and analytics pipelines that move and transform data between on-premises and cloud data stores. Azure Stream Analytics is introduced for analyzing real-time streaming data using a declarative query language.
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
This document discusses enabling next generation analytics with Azure Data Lake. It provides definitions of big data and discusses how big data is a cornerstone of Cortana Intelligence. It also discusses challenges with big data like obtaining skills and determining value. The document then discusses Azure HDInsight and how it provides a cloud Spark and Hadoop service. It also discusses StreamSets and how it can be used for data movement and deployment on Azure VM or local machine. Finally, it discusses a use case of StreamSets at a major bank to move data from on-premise to Azure Data Lake and consolidate migration tools.
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
This is a slide deck from QuerySurge's Big Data Testing webinar.
Learn why Testing is pivotal to the success of your Big Data Strategy .
Learn more at www.querysurge.com
The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth.
Learn why testing your enterprise's data is pivotal for success with big data, Hadoop and NoSQL. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data warehouse - all with one ETL testing tool.
This information is geared towards:
- Big Data & Data Warehouse Architects,
- ETL Developers
- ETL Testers, Big Data Testers
- Data Analysts
- Operations teams
- Business Intelligence (BI) Architects
- Data Management Officers & Directors
You will learn how to:
- Improve your Data Quality
- Accelerate your data testing cycles
- Reduce your costs & risks
- Provide a huge ROI (as high as 1,300%)
Big Data, IoT, data lake, unstructured data, Hadoop, cloud, and massively parallel processing (MPP) are all just fancy words unless you can find uses cases for all this technology. Join me as I talk about the many use cases I have seen, from streaming data to advanced analytics, broken down by industry. I’ll show you how all this technology fits together by discussing various architectures and the most common approaches to solving data problems and hopefully set off light bulbs in your head on how big data can help your organization make better business decisions.
This document discusses analytics and IoT. It covers key topics like data collection from IoT sensors, data storage and processing using big data tools, and performing descriptive, predictive, and prescriptive analytics. Cloud platforms and visualization tools that can be used to build end-to-end IoT and analytics solutions are also presented. The document provides an overview of building IoT solutions for collecting, analyzing, and gaining insights from sensor data.
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
Watch full webinar here: https://bit.ly/34iCruM
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Certus Solutions
Snowflake is a cloud data warehouse that provides elasticity, scalability, and simplicity. It allows organizations to consolidate their diverse data sources in one place and instantly scale up or down their compute capacity as needed. Aptus Health, a digital marketing company, used Snowflake to break down data silos, integrate disparate data sources, enable broad data sharing, and provide a scalable and cost-effective solution to meet their analytics needs. Snowflake addressed both business needs for timely access to centralized data and IT needs for flexibility, extensibility, and reducing ETL work.
Serverless SQL provides a serverless analytics platform that allows users to analyze data stored in object storage without having to manage infrastructure. Key features include seamless elasticity, pay-per-query consumption, and the ability to analyze data directly in object storage without having to move it. The platform includes serverless storage, data ingest, data transformation, analytics, and automation capabilities. It aims to create a sharing economy for analytics by allowing various users like developers, data engineers, and analysts flexible access to data and analytics.
The document discusses Azure Data Lake and U-SQL. It provides an overview of the Data Lake approach to storing and analyzing data compared to traditional data warehousing. It then describes Azure Data Lake Storage and Azure Data Lake Analytics, which provide scalable data storage and an analytics service built on Apache YARN. U-SQL is introduced as a language that unifies SQL and C# for querying data in Data Lakes and other Azure data sources.
High-performance database technology for rock-solid IoT solutionsClusterpoint
Clusterpoint is a privately held database software company founded in 2006 with 32 employees. Their product is a hybrid operational database, analytics, and search platform that provides secure, high-performance distributed data management at scale. It reduces total cost of ownership by 80% over traditional relational databases by providing blazing fast performance, unlimited scalability, and bulletproof transactions with instant text search and security. Clusterpoint also offers their database software as a cloud database as a service to instantly scale databases on demand.
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
This presentation is geared toward enterprise architects and senior IT leaders looking to drive more value from their data by learning about cloud data lake management.
As businesses focus on leveraging big data to drive digital transformation, technology leaders are struggling to keep pace with the high volume of data coming in at high speed and rapidly evolving technologies. What's needed is an approach that helps you turn petabytes into profit.
Cloud data lakes and cloud data warehouses have emerged as a popular architectural pattern to support next-generation analytics. Informatica's comprehensive AI-driven cloud data lake management solution natively ingests, streams, integrates, cleanses, governs, protects and processes big data workloads in multi-cloud environments.
Please leave any questions or comments below.
So you got a handle on what Big Data is and how you can use it to find business value in your data. Now you need an understanding of the Microsoft products that can be used to create a Big Data solution. Microsoft has many pieces of the puzzle and in this presentation I will show how they fit together. How does Microsoft enhance and add value to Big Data? From collecting data, transforming it, storing it, to visualizing it, I will show you Microsoft’s solutions for every step of the way
The document discusses challenges with traditional data warehousing and analytics including high upfront costs, difficulty managing infrastructure, and inability to scale easily. It introduces Amazon Web Services (AWS) and Amazon Redshift as a solution, allowing for easy setup of data warehousing and analytics in the cloud at low costs without large upfront investments. AWS services like Amazon Redshift provide flexible, scalable infrastructure that is easier to manage than traditional on-premise systems and enables organizations to more effectively analyze large amounts of data.
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
Data lakes are providing immense value to organizations embracing data science.
In this webinar, William will discuss the value of having broad, detailed, and seemingly obscure data available in cloud storage for purposes of expanding Data Science in the organization.
Data Virtualization for Data Architects (New Zealand)Denodo
Watch full webinar here: https://bit.ly/3ogCJKC
Success or failure in the digital age will be determined by how effectively organisations manage their data. The speed, diversity and volume of data present today can overwhelm older data architectures, leaving business leaders lacking the insight and operational agility needed to respond to market opportunity or competitive challenges.
With the pace of today’s business, modernisation of a data architecture must be seamless, and ideally, built on existing capabilities. This webinar explores how data virtualization can help provide a seamless evolution to the capabilities of an existing data architecture without business disruption.
You will discover:
How to modernise your data architectures without disturbing the existing analytical workload
- How to extend your data architecture to more quickly exploit existing, and new sources of data
- How to enable your data architecture to present more low latency data
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
Part 1 of a conference workshop. This forms the morning session, which looks at moving from Business Intelligence to Analytics.
Topics Covered: Azure Data Explorer, Azure Data Factory, Azure Synapse Analytics, Event Hubs, HDInsight, Big Data
Fabric Data Factory Pipeline Copy Perf Tips.pptxMark Kromer
This document provides performance tips for pipelines and copy activities in Azure Data Factory (ADF). It discusses:
- Using pipelines for data orchestration with conditional execution and parallel activities.
- The Copy activity provides massive-scale data movement within pipelines. Using Copy for ELT can land data quickly into a data lake.
- Gaining more throughput by using multiple parallel Copy activities but this can overload the source.
- Optimizing copy performance by using binary format, file lists/folders instead of individual files, and SQL source partitioning.
- Metrics showing copying Parquet files to a lakehouse at 5.1 GB/s while CSV and SQL loads were slower due to transformation.
The
Build data quality rules and data cleansing into your data pipelinesMark Kromer
This document provides guidance on building data quality rules and data cleansing into data pipelines. It discusses considerations for data quality in data warehouse and data science scenarios, including verifying data types and lengths, handling null values, domain value constraints, and reference data lookups. It also provides examples of techniques for replacing values, splitting data based on values, data profiling, pattern matching, enumerations/lookups, de-duplicating data, fuzzy joins, validating metadata rules, and using assertions.
Mapping Data Flows Training deck Q1 CY22Mark Kromer
Mapping data flows allow for code-free data transformation at scale using an Apache Spark engine within Azure Data Factory. Key points:
- Mapping data flows can handle structured and unstructured data using an intuitive visual interface without needing to know Spark, Scala, Python, etc.
- The data flow designer builds a transformation script that is executed on a JIT Spark cluster within ADF. This allows for scaled-out, serverless data transformation.
- Common uses of mapping data flows include ETL scenarios like slowly changing dimensions, analytics tasks like data profiling, cleansing, and aggregations.
Data cleansing and prep with synapse data flowsMark Kromer
This document provides resources for data cleansing and preparation using Azure Synapse Analytics Data Flows. It includes links to videos, documentation, and a slide deck that explain how to use Data Flows for tasks like deduplicating null values, saving data profiler summary statistics, and using metadata functions. A GitHub link shares a tutorial document for a hands-on learning experience with Synapse Data Flows.
Data cleansing and data prep with synapse data flowsMark Kromer
This document contains links to resources about using Azure Synapse Analytics for data cleansing and preparation with Data Flows. It includes links to videos and documentation about removing null values, saving data profiler summary statistics, and using metadata functions in Azure Data Factory data flows.
Mapping Data Flows Perf Tuning April 2021Mark Kromer
This document discusses optimizing performance for data flows in Azure Data Factory. It provides sample timing results for various scenarios and recommends settings to improve performance. Some best practices include using memory optimized Azure integration runtimes, maintaining current partitioning, scaling virtual cores, and optimizing transformations and sources/sinks. The document also covers monitoring flows to identify bottlenecks and global settings that affect performance.
This document discusses using Azure Data Factory (ADF) for data lake ETL processes in the cloud. It describes how ADF can ingest data from on-premises, cloud, and SaaS sources into a data lake for preparation, transformation, enrichment, and serving to downstream analytics or machine learning processes. The document also provides several links to YouTube videos and articles about using ADF for these tasks.
Azure Data Factory Data Wrangling with Power QueryMark Kromer
Azure Data Factory now allows users to perform data wrangling tasks through Power Query activities, translating M scripts into ADF data flow scripts executed on Apache Spark. This enables code-free data exploration, preparation, and operationalization of Power Query workflows within ADF pipelines. Examples of use cases include data engineers building ETL processes or analysts operationalizing existing queries to prepare data for modeling, with the goal of providing a data-first approach to building data flows and pipelines in ADF.
Azure Data Factory Data Flow Performance Tuning 101Mark Kromer
The document provides performance timing results and recommendations for optimizing Azure Data Factory data flows. Sample 1 processed a 421MB file with 887k rows in 4 minutes using default partitioning on an 80-core Azure IR. Sample 2 processed a table with the same size and transforms in 3 minutes using source and derived column partitioning. Sample 3 processed the same size file in 2 minutes with default partitioning. The document recommends partitioning strategies, using memory optimized clusters, and scaling cores to improve performance.
Azure Data Factory Data Flows Training (Sept 2020 Update)Mark Kromer
Mapping data flows allow for code-free data transformation using an intuitive visual interface. They provide resilient data flows that can handle structured and unstructured data using an Apache Spark engine. Mapping data flows can be used for common tasks like data cleansing, validation, aggregation, and fact loading into a data warehouse. They allow transforming data at scale through an expressive language without needing to know Spark, Scala, Python, or manage clusters.
Data quality patterns in the cloud with ADFMark Kromer
Azure Data Factory can be used to build modern data warehouse patterns with Azure SQL Data Warehouse. It allows extracting and transforming relational data from databases and loading it into Azure SQL Data Warehouse tables optimized for analytics. Data flows in Azure Data Factory can also clean and join disparate data from Azure Storage, Data Lake Store, and other data sources for loading into the data warehouse. This provides simple and productive ETL capabilities in the cloud at any scale.
Azure Data Factory Data Flows Training v005Mark Kromer
Mapping Data Flow is a new feature of Azure Data Factory that allows building data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine for processing big data with unstructured requirements. Mapping Data Flows can be authored and designed visually, with transformations, expressions, and results previews, and then operationalized with Data Factory scheduling, monitoring, and control flow.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This document discusses data quality patterns when using Azure Data Factory (ADF). It presents two modern data warehouse patterns that use ADF for orchestration: one using traditional ADF activities and another leveraging ADF mapping data flows. It also provides links to additional resources on ADF data flows, data quality patterns, expressions, performance, and connectors.
Azure Data Factory can now use Mapping Data Flows to orchestrate ETL workloads. Mapping Data Flows allow users to visually design transformations on data from disparate sources and load the results into Azure SQL Data Warehouse for analytics. The key benefits of Mapping Data Flows are that they provide a visual interface for building expressions to cleanse and join data with auto-complete assistance and live previews of expression results.
Mapping Data Flow is a new feature of Azure Data Factory that allows users to build data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine for processing big data with unstructured requirements. Mapping Data Flows can be operationalized with Data Factory's scheduling, control flow, and monitoring capabilities.
ADF Mapping Data Flows Training Slides V1Mark Kromer
Mapping Data Flow is a new feature of Azure Data Factory that allows users to build data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine to transform data at scale in the cloud in a resilient manner for big data scenarios involving unstructured data. Mapping Data Flows can be operationalized with Azure Data Factory's scheduling, control flow, and monitoring capabilities.
Azure Data Factory ETL Patterns in the CloudMark Kromer
This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, the importance of scale and flexible schemas in cloud ETL, and how Azure Data Factory supports workflows, templates, and integration with on-premises and cloud data. It also provides examples of nightly ETL data flows, handling schema drift, loading dimensional models, and data science scenarios using Azure data services.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell
With expertise in data architecture, performance tracking, and revenue forecasting, Andrew Marnell plays a vital role in aligning business strategies with data insights. Andrew Marnell’s ability to lead cross-functional teams ensures businesses achieve sustainable growth and operational excellence.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
Cyber Awareness overview for 2025 month of securityriccardosl1
Microsoft Azure Big Data Analytics
1. Big Data Analytics in the
Cloud
Microsoft Azure
Cortana Intelligence Suite
Mark Kromer
Microsoft Azure Cloud Data Architect
@kromerbigdata
@mssqldude
2. What is Big Data Analytics?
Tech Target: “… the process of examining large data sets to uncover hidden patterns, unknown correlations, market
trends, customer preferences and other useful business information.”
Techopedia: “… the strategy of analyzing large volumes of data, or big data. This big data is gathered from a wide
variety of sources, including social networks, videos, digital images, sensors, and sales transaction records. The
aim in analyzing all this data is to uncover patterns and connections that might otherwise be invisible, and that
might provide valuable insights about the users who created it. Through this insight, businesses may be able to
gain an edge over their rivals and make superior business decisions.”
2
Requires lots of data wrangling and Data Engineers
Requires Data Scientists to uncover patterns from
complex raw data
Requires Business Analysts to provide business value
from multiple data sources
Requires additional tools and infrastructure not
provided by traditional database and BI technologies
Why Cloud for Big Data Analytics?
• Quick and easy to stand-up new, large, big data architectures
• Elastic scale
• Metered pricing
• Quickly evolve architectures to rapidly changing landscapes
5. What it is:
When to use it:
Microsoft’s Cloud Platform including IaaS, PaaS and SaaS
• Storage and Data
• Networking
• Security
• Services
• Virtual Machines
• On-demand Resources and Services
6. What it is:
When to use it:
A pipeline system to move data in, perform activities on
data, move data around, and move data out
• Create solutions using multiple tools as a single process
• Orchestrate processes - Scheduling
• Monitor and manage pipelines
• Call and re-train Azure ML models
9. Example - Churn
Call Log Files
Customer Table
Call Log Files
Customer Table
Customer
Churn Table
Azure Data
Factory:
Data Sources
Customers
Likely to
Churn
Customer
Call Details
Transform & Analyze PublishIngest
10. Simple ADF:
• Business Goal: Transform and Analyze Web Logs each month
• Design Process: Transform Raw Weblogs, using a Hive Query,
storing the results in Blob Storage
Web Logs
Loaded to
Blob
Files ready
for analysis
and use in
AzureML
HDInsight HIVE query
to transform Log
entries
11. PowerShell ADF Example
1. Add-AzureAccount and enter the user name and password
2. Get-AzureSubscription to view all the subscriptions for this
account.
3. Select-AzureSubscription to select the subscription that you
want to work with.
4. Switch-AzureMode AzureResourceManager
5. New-AzureResourceGroup -Name
ADFTutorialResourceGroup -Location "West US"
6. New-AzureDataFactory -ResourceGroupName
ADFTutorialResourceGroup –Name DataFactory(your
alias)Pipeline –Location "West US"
12. Using Visual Studio
• Use in mature dev environments
• Use when integrated into larger development
process
13. What it is:
When to use it:
A Scaling Data Warehouse Service in the Cloud
• When you need a large-data BI solution in the cloud
• MPP SQL Server in the Cloud
• Elastic scale data warehousing
• When you need pause-able scale-out compute
14. Elastic scale & performance
Real-time elasticity
Resize in <1 minute On-demand compute
Expand or reduce
as needed
Pause Data Warehouse to Save
on Compute Costs. I.e. Pause
during non-business hours
15. Storage can be as big or
small as required
Users can execute niche workloads
without re-scanning data
Elastic scale & performance
Scale
18. SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales];
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales];
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales];
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales];
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales];
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales];
Compute
Control
19. What it is:
When to use it:
Data storage (Web-HDFS) and Distributed Data Processing (HIVE,
Spark, HBase, Storm, U-SQL) Engines
• Low-cost, high-throughput data store
• Non-relational data
• Larger storage limits than Blobs
20. Ingest all data
regardless of
requirements
Store all data
in native format
without schema
definition
Do analysis
Using analytic
engines like Hadoop
and ADLA
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
22. No limits to SCALE
Store ANY DATA in its native format
HADOOP FILE SYSTEM (HDFS) for the
cloud
Optimized for analytic workload
PERFORMANCE
ENTERPRISE GRADE authentication,
access control, audit, encryption at rest
Azure Data Lake
Store
A hyper scale repository for big
data analytics workloads
Introducing ADLS
23. • No fixed limits on:
• Amount of data stored
• How long data can be stored
• Number of files
• Size of the individual files
• Ingestion/egress throughput
Seamlessly scales
from a few KBs
to several PBs
No limits to scale
24. No limits to storage
24
• Each file in ADL Store is sliced into blocks
• Blocks are distributed across multiple data
nodes in the backend storage system
• With sufficient number of backend storage
data nodes, files of any size can be stored
• Backend storage runs in the Azure cloud
which has virtually unlimited resources
• Metadata is stored about each file
No limit to metadata either.
Azure Data Lake Store file
…Block 1 Block 2 Block 2
Backend Storage
Data node Data node Data node Data node Data nodeData node
Block Block Block Block Block Block
25. Massive throughput
25
• Through read parallelism ADL Store provides
massive throughput
• Each read operation on a ADL Store file
results in multiple read operations executed
in parallel against the backend storage data
nodes
Read operation
Azure Data Lake Store file
…Block 1 Block 2 Block 2
Backend storage
Data node Data node Data node Data node Data nodeData node
Block Block Block Block Block Block
26. Enterprise grade security
Enterprise-grade security permits even
sensitive data to be stored securely
Regulatory compliance can be enforced
Integrates with Azure Active Directory for
authentication
Data is encrypted at rest and in flight
POSIX-style permissions on files and
directories
Audit logs for all operations
26
27. Enterprise grade availability and reliability
27
• Azure maintains 3 replicas of each data
object per region across three fault and
upgrade domains
• Each create or append operation on a replica
is replicated to other two
• Writes are committed to application only
after all replicas are successfully updated
• Read operations can go against
any replica
• Provides ‘read-after-write’ consistency
Data is never lost or unavailable
even under failures
Replica 1
Replica 2 Replica 3
Fault/upgrade
domains
Write Commit
28. Enterprise-
grade
Limitless scaleProductivity
from day one
Easy and
powerful data
preparation
All data
0100101001000101010100101001000
10101010010100100010101010010100
10001010101001010010001010101001
0100100010101010010100100010101
0100101001000101010100101001000
10101010010100100010101010010100
10001010101001010010001010101001
0100100010101010010100100010101
0100101001000101010100101001000
10101010010100100010101010010100
32. What is
U-SQL?
A hyper-scalable, highly extensible
language for preparing, transforming
and analyzing all data
Allows users to focus on the what—
not the how—of business problems
Built on familiar languages (SQL and
C#) and supported by a fully integrated
development environment
Built for data developers & scientists
32
33. REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt“
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt“
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, SUM(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;
35. EXTRACT Expression
@s = EXTRACT a string, b int
FROM "filepath/file.csv"
USING Extractors.Csv(encoding: Encoding.Unicode);
• Built-in Extractors: Csv, Tsv, Text with lots of options
• Custom Extractors: e.g., JSON, XML, etc.
OUTPUT Expression
OUTPUT @s
TO "filepath/file.csv"
USING Outputters.Csv();
• Built-in Outputters: Csv, Tsv, Text
• Custom Outputters: e.g., JSON, XML, etc. (see http://usql.io)
Filepath URIs
• Relative URI to default ADL Storage account: "filepath/file.csv"
• Absolute URIs:
• ADLS: "adl://account.azuredatalakestore.net/filepath/file.csv"
• WASB: "wasb://container@account/filepath/file.csv"
36. • Create assemblies
• Reference assemblies
• Enumerate assemblies
• Drop assemblies
• VisualStudio makes registration easy!
• CREATE ASSEMBLY db.assembly FROM @path;
• CREATE ASSEMBLY db.assembly FROM byte[];
• Can also include additional resource files
• REFERENCE ASSEMBLY db.assembly;
• Referencing .Net Framework Assemblies
• Always accessible system namespaces:
• U-SQL specific (e.g., for SQL.MAP)
• All provided by system.dll system.core.dll
system.data.dll, System.Runtime.Serialization.dll,
mscorelib.dll (e.g., System.Text,
System.Text.RegularExpressions, System.Linq)
• Add all other .Net Framework Assemblies with:
REFERENCE SYSTEM ASSEMBLY [System.XML];
• Enumerating Assemblies
• Powershell command
• U-SQL Studio Server Explorer
• DROP ASSEMBLY db.assembly;
37. 'USING' csharp_namespace
| Alias '=' csharp_namespace_or_class.
Examples:
DECLARE @ input string = "somejsonfile.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
@data0 =
EXTRACT IPAddresses string
FROM @input
USING new JsonExtractor("Devices[*]");
USING json =
[Microsoft.Analytics.Samples.Formats.Json.JsonExtractor];
@data1 =
EXTRACT IPAddresses string
FROM @input
USING new json("Devices[*]");
38. Simple pattern language on filename and path
@pattern string =
"/input/{date:yyyy}/{date:MM}/{date:dd}/{*}.{suffix}";
• Binds two columns date and suffix
• Wildcards the filename
• Limits on number of files
(Current limit 800 and 3000 being increased in next refresh)
Virtual columns
EXTRACT
name string
, suffix string // virtual column
, date DateTime // virtual column
FROM @pattern
USING Extractors.Csv();
• Refer to virtual columns in query predicates to get partition
elimination
(otherwise you will get a warning)
39. • Naming
• Discovery
• Sharing
• Securing
U-SQL Catalog Naming
• Default Database and Schema context: master.dbo
• Quote identifiers with []: [my table]
• Stores data in ADL Storage /catalog folder
Discovery
• Visual Studio Server Explorer
• Azure Data Lake Analytics Portal
• SDKs and Azure Powershell commands
Sharing
• Within an Azure Data Lake Analytics account
Securing
• Secured with AAD principals at catalog and Database level
40. Views
CREATE VIEW V AS EXTRACT…
CREATE VIEW V AS SELECT …
• Cannot contain user-defined objects (e.g. UDF or UDOs)!
• Will be inlined
Table-Valued Functions (TVFs)
CREATE FUNCTION F (@arg string = "default")
RETURNS @res [TABLE ( … )]
AS BEGIN … @res = … END;
• Provides parameterization
• One or more results
• Can contain multiple statements
• Can contain user-code (needs assembly reference)
• Will always be inlined
• Infers schema or checks against specified return schema
41. CREATE PROCEDURE P (@arg string = "default“) AS
BEGIN
…;
OUTPUT @res TO …;
INSERT INTO T …;
END;
• Provides parameterization
• No result but writes into file or table
• Can contain multiple statements
• Can contain user-code (needs assembly reference)
• Will always be inlined
• Can contain DDL (but no CREATE, DROP FUNCTION/PROCEDURE)
42. CREATE TABLE T (col1 int
, col2 string
, col3 SQL.MAP<string,string>
, INDEX idx CLUSTERED (col2 ASC)
PARTITION BY (col1)
DISTRIBUTED BY HASH (driver_id)
);
• Structured Data, built-in Data types only (no UDTs)
• Clustered Index (needs to be specified): row-oriented
• Fine-grained distribution (needs to be specified):
• HASH, DIRECT HASH, RANGE, ROUND ROBIN
• Addressable Partitions (optional)
CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …;
CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…;
CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT);
• Infer the schema from the query
• Still requires index and distribution (does not support partitioning)
43. Benefits of Table clustering and distribution
• Faster lookup of data provided by distribution and clustering when right
distribution/cluster is chosen
• Data distribution provides better localized scale out
• Used for filters, joins and grouping
Benefits of Table partitioning
• Provides data life cycle management (“expire” old partitions)
• Partial re-computation of data at partition level
• Query predicates can provide partition elimination
Do not use when…
• No filters, joins and grouping
• No reuse of the data for future queries
44. ALTER TABLE T ADD COLUMN eventName string;
ALTER TABLE T DROP COLUMN col3;
ALTER TABLE T ADD COLUMN result string, clientId string,
payload int?;
ALTER TABLE T DROP COLUMN clientId, result;
• Meta-data only operation
• Existing rows will get
• Non-nullable types: C# data type default value (e.g., int will be 0)
• Nullable types: null
50. Visual Studio fully supports
authoring U-SQL scripts
While editing, it provides:
IntelliSense
Syntax color coding
Syntax checking
…
Contextual
Menu
50
51. C# code to extend U-SQL can be authored and used directly in U-SQL Studio,
without having to first creating and registering an external assembly.
Custom
Processor
51
52. Jobs can be submitted directly from Visual Studio in two ways
You have to be logged into Azure and have to specify the target Azure Data Lake account.
53. Each job is broken into ‘n’ number
of vertices
Each vertex is some work that
needs to be done
Input
Output
Output
6 Stages
8 Vertexes
Vertexes are organized into stages
– Vertexes in each stage do the same
work on the same data
– Vertex in one stage may depend on a
vertex in a earlier stage
Stages themselves are organized into
an acyclic graph
53
54. Job execution graph
After a job is submitted
the progress of the
execution of the job as
it goes through the
different stages is
shown and updated
continuously
Important stats about
the job are also
displayed and updated
continuously
54
56. ADL Analytics creates and stores a set of metadata objects in a catalog
maintained by a metadata service
Tables and TVFs are created by DDL statements
(CREATE TABLE …)
Metadata objects can be created directly through the Server Explorer
Azure Data Lake Analytics account
Databases
– Tables
– Table valued functions
– Jobs
– Schemas
Linked storage
57. The metadata catalog can
be browsed with the Visual
Studio Server Explorer
Server Explorer lets you:
1. Create new tables, schemas
and databases
2. Register assemblies
58. What it is:
When to use it:
Microsoft’s implementation of apache Hadoop (as a
service) that uses Blobs for persistent storage
• When you need to process large scale data (PB+)
• When you want to use Hadoop or Spark as a service
• When you want to compute data and retire the servers,
but retain the results
• When your team is familiar with the Hadoop Zoo
66. What it is:
When to use it:
A multi-platform environment and engine to create and
deploy Machine Learning models and API’s
• When you need to create predictive analytics
• When you need to share Data Science experiments
across teams
• When you need to create call-able API’s for ML functions
• When you also have R and Python experience on your
Data Science team
67. Development Environment
• Creating Experiments
• Sharing a Workspace
Deployment Environment
• Publishing the Model
• Using the API
• Consuming in various tools
72. What it is:
When to use it:
Interactive Report and Visualization creation for
computing and mobile platforms
• When you need to create and view interactive reports
that combine multiple datasets
• When you need to embed reporting into an application
• When you need customizable visualizations
• When you need to create shared datasets, reports, and
dashboards that you publish to your team
77. Data
Transformation
Data
Collection
Presentation
and action
Queuing
System
Data Storage
8
Azure Search
Data analytics (Excel,
Power BI, Looker,
Tableau)
Web/thick client
dashboards
Devices to take action
Event hub
Event & data
producers
Applications
Web and social
Devices
Live Dashboards
DocumentDB
MongoDB
SQL Azure
ADW
Hbase
Blob StorageKafka/RabbitMQ/
ActiveMQ
Event hubs Azure ML
Storm / Stream
Analytics
Hive / U-SQL
Data Factory
Sensors
Pig
Cloud gateways
(web APIs)
Field
gateways
#6: What you can do with it: https://azure.microsoft.com/en-us/overview/what-is-azure/
Platform: http://microsoftazure.com
Storage: https://azure.microsoft.com/en-us/documentation/services/storage/
Networking: https://azure.microsoft.com/en-us/documentation/services/virtual-network/
Security: https://azure.microsoft.com/en-us/documentation/services/active-directory/
Services: https://azure.microsoft.com/en-us/documentation/articles/best-practices-scalability-checklist/
Virtual Machines: https://azure.microsoft.com/en-us/documentation/services/virtual-machines/windows/ and https://azure.microsoft.com/en-us/documentation/services/virtual-machines/linux/
PaaS: https://azure.microsoft.com/en-us/documentation/services/app-service/
#7: Azure Data Factory: http://azure.microsoft.com/en-us/services/data-factory/
#10: Video of this process: https://azure.microsoft.com/en-us/documentation/videos/azure-data-factory-102-analyzing-complex-churn-models-with-azure-data-factory/
#11: More options: Prepare System: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/ - Follow steps
Another Lab: https://azure.microsoft.com/en-us/documentation/articles/data-factory-samples/
#13: Overview: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
Using the Portal: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/
#14: Azure SQL Data Warehouse: http://azure.microsoft.com/en-us/services/sql-data-warehouse/
#20: Azure Data Lake: http://azure.microsoft.com/en-us/campaigns/data-lake/
#29: All data
Unstructured, Semi structured, Structured
Domain-specific user defined types using C#
Queries over Data Lake and Azure Blobs
Federated Queries over Operational and DW SQL stores removing the complexity of ETL
Productive from day one
Effortless scale and performance without need to manually tune/configure
Best developer experience throughout development lifecycle for both novices and experts
Leverage your existing skills with SQL and .NET
Easy and powerful data preparation
Easy to use built-in connectors for common data formats
Simple and rich extensibility model for adding customer – specific data transformation – both existing and new
No limits scale
Scales on demand with no change to code
Automatically parallelizes SQL and custom code
Designed to process petabytes of data
Enterprise grade
Managing, securing, sharing, and discovery of familiar data and code objects (tables, functions etc.)
Role based authorization of Catalogs and storage accounts using AAD security
Auditing of catalog objects (databases,tables etc.)
#31: ADLA allows you to compute on data anywhere and a join data from multiple cloud sources.
#60: Primary site: https://azure.microsoft.com/en-us/services/hdinsight/
Quick overview: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/
4-week online course through the edX platform: https://www.edx.org/course/processing-big-data-azure-hdinsight-microsoft-dat202-1x
11 minute introductory video: https://channel9.msdn.com/Series/Getting-started-with-Windows-Azure-HDInsight-Service/Introduction-To-Windows-Azure-HDInsight-Service
Microsoft Virtual Academy Training (4 hours) - https://mva.microsoft.com/en-US/training-courses/big-data-analytics-with-hdinsight-hadoop-on-azure-10551?l=UJ7MAv97_5804984382
Learning path for HDInsight: https://azure.microsoft.com/en-us/documentation/learning-paths/hdinsight-self-guided-hadoop-training/
Azure Feature Pack for SQL Server 2016, i.e., SSIS (SQL Server Integration Services): https://msdn.microsoft.com/en-us/library/mt146770(v=sql.130).aspx
#66: Azure Portal: http://azure.portal.com
Provisioning Clusters: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-provision-clusters/
Different clusters have different node types, number of nodes, and node sizes.
#68: Guided tutorials: https://azure.microsoft.com/en-us/documentation/services/machine-learning/
Microsoft Azure Virtual Academy course: https://mva.microsoft.com/en-US/training-courses/microsoft-azure-machine-learning-jump-start-8425?l=ehQZFoKz_7904984382
#70: Designing an experiment in the Studio: https://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-ml-studio/
#73: Power BI: https://powerbi.microsoft.com/