Mapping Data Flows Training deck Q1 CY22Mark Kromer
Mapping data flows allow for code-free data transformation at scale using an Apache Spark engine within Azure Data Factory. Key points:
- Mapping data flows can handle structured and unstructured data using an intuitive visual interface without needing to know Spark, Scala, Python, etc.
- The data flow designer builds a transformation script that is executed on a JIT Spark cluster within ADF. This allows for scaled-out, serverless data transformation.
- Common uses of mapping data flows include ETL scenarios like slowly changing dimensions, analytics tasks like data profiling, cleansing, and aggregations.
DRAFT: Extend Industry Well-Architected Frameworks to focus on Data and business outcomes. Addition of Data to the cloud framework will resolve fragmented approaches that customers are struggling with respect to data placement within various cloud providers.
Azure Data Factory Data Flows Training (Sept 2020 Update)Mark Kromer
Mapping data flows allow for code-free data transformation using an intuitive visual interface. They provide resilient data flows that can handle structured and unstructured data using an Apache Spark engine. Mapping data flows can be used for common tasks like data cleansing, validation, aggregation, and fact loading into a data warehouse. They allow transforming data at scale through an expressive language without needing to know Spark, Scala, Python, or manage clusters.
This document provides an overview of Azure Databricks, including:
- Azure Databricks is an Apache Spark-based analytics platform optimized for Microsoft Azure cloud services. It includes Spark SQL, streaming, machine learning libraries, and integrates fully with Azure services.
- Clusters in Azure Databricks provide a unified platform for various analytics use cases. The workspace stores notebooks, libraries, dashboards, and folders. Notebooks provide a code environment with visualizations. Jobs and alerts can run and notify on notebooks.
- The Databricks File System (DBFS) stores files in Azure Blob storage in a distributed file system accessible from notebooks. Business intelligence tools can connect to Databricks clusters via JDBC
This presentation is for those of you who are interested in moving your on-prem SQL Server databases and servers to Azure virtual machines (VM’s) in the cloud so you can take advantage of all the benefits of being in the cloud. This is commonly referred to as a “lift and shift” as part of an Infrastructure-as-a-service (IaaS) solution. I will discuss the various Azure VM sizes and options, migration strategies, storage options, high availability (HA) and disaster recovery (DR) solutions, and best practices.
Azure Data Factory is a data integration service that allows for data movement and transformation between both on-premises and cloud data stores. It uses datasets to represent data structures, activities to define actions on data with pipelines grouping related activities, and linked services to connect to external resources. Key concepts include datasets representing input/output data, activities performing actions like copy, and pipelines logically grouping activities.
Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. In this session we will learn how to create data integration solutions using the Data Factory service and ingest data from various data stores, transform/process the data, and publish the result data to the data stores.
Azure data analytics platform - A reference architecture Rajesh Kumar
This document provides an overview of Azure data analytics architecture using the Lambda architecture pattern. It covers Azure data and services, including ingestion, storage, processing, analysis and interaction services. It provides a brief overview of the Lambda architecture including the batch layer for pre-computed views, speed layer for real-time views, and serving layer. It also discusses Azure data distribution, SQL Data Warehouse architecture and design best practices, and data modeling guidance.
This document provides an overview of using Azure Data Factory (ADF) for ETL workflows. It discusses the components of modern data engineering, how to design ETL processes in Azure, an overview of ADF and its components. It also previews a demo on creating an ADF pipeline to copy data into Azure Synapse Analytics. The agenda includes discussions of data ingestion techniques in ADF, components of ADF like linked services, datasets, pipelines and triggers. It concludes with references, a Q&A section and a request for feedback.
This document discusses Microsoft's cloud platform offerings for SAP software. It outlines Microsoft's cloud infrastructure services for SAP applications including Azure, analytics and insights solutions, and productivity and mobile solutions. It then provides more details on specific Azure capabilities for SAP like SAP HANA, SAP NetWeaver, and SAP deployment scenarios. The document aims to showcase how Microsoft's cloud platform can help SAP customers reduce costs and complexity while improving agility.
Microsoft Azure is the only hybrid cloud to help you migrate your apps, data, and infrastructure with cost-effective and flexible paths. At this event you’ll learn how thousands of customers have migrated to Azure, at their own pace and with high confidence by using a reliable methodology, flexible and powerful tools, and proven partner expertise. Come to this event to learn how Azure can help you save—before, during, and after migration, and how it offers unmatched value during every stage of your cloud migration journey. Learn about assessments, migration offers, and cost management tools to help you migrate with confidence.
Delta Lake brings reliability, performance, and security to data lakes. It provides ACID transactions, schema enforcement, and unified handling of batch and streaming data to make data lakes more reliable. Delta Lake also features lightning fast query performance through its optimized Delta Engine. It enables security and compliance at scale through access controls and versioning of data. Delta Lake further offers an open approach and avoids vendor lock-in by using open formats like Parquet that can integrate with various ecosystems.
The document discusses Azure Data Factory V2 data flows. It will provide an introduction to Azure Data Factory, discuss data flows, and have attendees build a simple data flow to demonstrate how they work. The speaker will introduce Azure Data Factory and data flows, explain concepts like pipelines, linked services, and data flows, and guide a hands-on demo where attendees build a data flow to join customer data to postal district data to add matching postal towns.
Azure Data Factory ETL Patterns in the CloudMark Kromer
This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, the importance of scale and flexible schemas in cloud ETL, and how Azure Data Factory supports workflows, templates, and integration with on-premises and cloud data. It also provides examples of nightly ETL data flows, handling schema drift, loading dimensional models, and data science scenarios using Azure data services.
Azure Data Factory Mapping Data Flow allows users to stage and transform data in Azure during a limited preview period beginning in February 2019. Data can be staged from Azure Data Lake Storage, Blob Storage, or SQL databases/data warehouses, then transformed using visual data flows before being landed to staging areas in Azure like ADLS, Blob Storage, or SQL databases. For information, contact [email protected] or visit http://aka.ms/dataflowpreview.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
This document provides an overview of Azure Data Factory (ADF), including why it is used, its key components and activities, how it works, and differences between versions 1 and 2. It describes the main steps in ADF as connect and collect, transform and enrich, publish, and monitor. The main components are pipelines, activities, datasets, and linked services. Activities include data movement, transformation, and control. Integration runtime and system variables are also summarized.
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Apache Spark is a fast and general engine for large-scale data processing. It was created by UC Berkeley and is now the dominant framework in big data. Spark can run programs over 100x faster than Hadoop in memory, or more than 10x faster on disk. It supports Scala, Java, Python, and R. Databricks provides a Spark platform on Azure that is optimized for performance and integrates tightly with other Azure services. Key benefits of Databricks on Azure include security, ease of use, data access, high performance, and the ability to solve complex analytics problems.
The document discusses Azure Data Factory and its capabilities for cloud-first data integration and transformation. ADF allows orchestrating data movement and transforming data at scale across hybrid and multi-cloud environments using a visual, code-free interface. It provides serverless scalability without infrastructure to manage along with capabilities for lifting and running SQL Server Integration Services packages in Azure.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This document discusses data quality patterns when using Azure Data Factory (ADF). It presents two modern data warehouse patterns that use ADF for orchestration: one using traditional ADF activities and another leveraging ADF mapping data flows. It also provides links to additional resources on ADF data flows, data quality patterns, expressions, performance, and connectors.
Modern DW Architecture
- The document discusses modern data warehouse architectures using Azure cloud services like Azure Data Lake, Azure Databricks, and Azure Synapse. It covers storage options like ADLS Gen 1 and Gen 2 and data processing tools like Databricks and Synapse. It highlights how to optimize architectures for cost and performance using features like auto-scaling, shutdown, and lifecycle management policies. Finally, it provides a demo of a sample end-to-end data pipeline.
Microsoft Data Platform - What's includedJames Serra
This document provides an overview of a speaker and their upcoming presentation on Microsoft's data platform. The speaker is a 30-year IT veteran who has worked in various roles including BI architect, developer, and consultant. Their presentation will cover collecting and managing data, transforming and analyzing data, and visualizing and making decisions from data. It will also discuss Microsoft's various product offerings for data warehousing and big data solutions.
Azure Data Factory can now use Mapping Data Flows to orchestrate ETL workloads. Mapping Data Flows allow users to visually design transformations on data from disparate sources and load the results into Azure SQL Data Warehouse for analytics. The key benefits of Mapping Data Flows are that they provide a visual interface for building expressions to cleanse and join data with auto-complete assistance and live previews of expression results.
Migrating minimal databases with minimal downtime to AWS RDS, Amazon Redshift and Amazon Aurora
Migration of databases to same and different engines and from on premise to cloud
Schema conversion from Oracle and SQL Server to MySQL and Aurora
The document discusses the data flow task in SQL Server Integration Services (SSIS). It encapsulates the data flow engine and performs ETL processes like extract, transform, and load data. Data flow components include sources that extract data, transformations that modify data, and destinations that load data. Paths connect the components and create the data flow pipeline. Sources extract from different data sources. Transformations modify data through row-level and rowset operations. Destinations load data to various targets.
Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. In this session we will learn how to create data integration solutions using the Data Factory service and ingest data from various data stores, transform/process the data, and publish the result data to the data stores.
Azure data analytics platform - A reference architecture Rajesh Kumar
This document provides an overview of Azure data analytics architecture using the Lambda architecture pattern. It covers Azure data and services, including ingestion, storage, processing, analysis and interaction services. It provides a brief overview of the Lambda architecture including the batch layer for pre-computed views, speed layer for real-time views, and serving layer. It also discusses Azure data distribution, SQL Data Warehouse architecture and design best practices, and data modeling guidance.
This document provides an overview of using Azure Data Factory (ADF) for ETL workflows. It discusses the components of modern data engineering, how to design ETL processes in Azure, an overview of ADF and its components. It also previews a demo on creating an ADF pipeline to copy data into Azure Synapse Analytics. The agenda includes discussions of data ingestion techniques in ADF, components of ADF like linked services, datasets, pipelines and triggers. It concludes with references, a Q&A section and a request for feedback.
This document discusses Microsoft's cloud platform offerings for SAP software. It outlines Microsoft's cloud infrastructure services for SAP applications including Azure, analytics and insights solutions, and productivity and mobile solutions. It then provides more details on specific Azure capabilities for SAP like SAP HANA, SAP NetWeaver, and SAP deployment scenarios. The document aims to showcase how Microsoft's cloud platform can help SAP customers reduce costs and complexity while improving agility.
Microsoft Azure is the only hybrid cloud to help you migrate your apps, data, and infrastructure with cost-effective and flexible paths. At this event you’ll learn how thousands of customers have migrated to Azure, at their own pace and with high confidence by using a reliable methodology, flexible and powerful tools, and proven partner expertise. Come to this event to learn how Azure can help you save—before, during, and after migration, and how it offers unmatched value during every stage of your cloud migration journey. Learn about assessments, migration offers, and cost management tools to help you migrate with confidence.
Delta Lake brings reliability, performance, and security to data lakes. It provides ACID transactions, schema enforcement, and unified handling of batch and streaming data to make data lakes more reliable. Delta Lake also features lightning fast query performance through its optimized Delta Engine. It enables security and compliance at scale through access controls and versioning of data. Delta Lake further offers an open approach and avoids vendor lock-in by using open formats like Parquet that can integrate with various ecosystems.
The document discusses Azure Data Factory V2 data flows. It will provide an introduction to Azure Data Factory, discuss data flows, and have attendees build a simple data flow to demonstrate how they work. The speaker will introduce Azure Data Factory and data flows, explain concepts like pipelines, linked services, and data flows, and guide a hands-on demo where attendees build a data flow to join customer data to postal district data to add matching postal towns.
Azure Data Factory ETL Patterns in the CloudMark Kromer
This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, the importance of scale and flexible schemas in cloud ETL, and how Azure Data Factory supports workflows, templates, and integration with on-premises and cloud data. It also provides examples of nightly ETL data flows, handling schema drift, loading dimensional models, and data science scenarios using Azure data services.
Azure Data Factory Mapping Data Flow allows users to stage and transform data in Azure during a limited preview period beginning in February 2019. Data can be staged from Azure Data Lake Storage, Blob Storage, or SQL databases/data warehouses, then transformed using visual data flows before being landed to staging areas in Azure like ADLS, Blob Storage, or SQL databases. For information, contact [email protected] or visit http://aka.ms/dataflowpreview.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
This document provides an overview of Azure Data Factory (ADF), including why it is used, its key components and activities, how it works, and differences between versions 1 and 2. It describes the main steps in ADF as connect and collect, transform and enrich, publish, and monitor. The main components are pipelines, activities, datasets, and linked services. Activities include data movement, transformation, and control. Integration runtime and system variables are also summarized.
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Apache Spark is a fast and general engine for large-scale data processing. It was created by UC Berkeley and is now the dominant framework in big data. Spark can run programs over 100x faster than Hadoop in memory, or more than 10x faster on disk. It supports Scala, Java, Python, and R. Databricks provides a Spark platform on Azure that is optimized for performance and integrates tightly with other Azure services. Key benefits of Databricks on Azure include security, ease of use, data access, high performance, and the ability to solve complex analytics problems.
The document discusses Azure Data Factory and its capabilities for cloud-first data integration and transformation. ADF allows orchestrating data movement and transforming data at scale across hybrid and multi-cloud environments using a visual, code-free interface. It provides serverless scalability without infrastructure to manage along with capabilities for lifting and running SQL Server Integration Services packages in Azure.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This document discusses data quality patterns when using Azure Data Factory (ADF). It presents two modern data warehouse patterns that use ADF for orchestration: one using traditional ADF activities and another leveraging ADF mapping data flows. It also provides links to additional resources on ADF data flows, data quality patterns, expressions, performance, and connectors.
Modern DW Architecture
- The document discusses modern data warehouse architectures using Azure cloud services like Azure Data Lake, Azure Databricks, and Azure Synapse. It covers storage options like ADLS Gen 1 and Gen 2 and data processing tools like Databricks and Synapse. It highlights how to optimize architectures for cost and performance using features like auto-scaling, shutdown, and lifecycle management policies. Finally, it provides a demo of a sample end-to-end data pipeline.
Microsoft Data Platform - What's includedJames Serra
This document provides an overview of a speaker and their upcoming presentation on Microsoft's data platform. The speaker is a 30-year IT veteran who has worked in various roles including BI architect, developer, and consultant. Their presentation will cover collecting and managing data, transforming and analyzing data, and visualizing and making decisions from data. It will also discuss Microsoft's various product offerings for data warehousing and big data solutions.
Azure Data Factory can now use Mapping Data Flows to orchestrate ETL workloads. Mapping Data Flows allow users to visually design transformations on data from disparate sources and load the results into Azure SQL Data Warehouse for analytics. The key benefits of Mapping Data Flows are that they provide a visual interface for building expressions to cleanse and join data with auto-complete assistance and live previews of expression results.
Migrating minimal databases with minimal downtime to AWS RDS, Amazon Redshift and Amazon Aurora
Migration of databases to same and different engines and from on premise to cloud
Schema conversion from Oracle and SQL Server to MySQL and Aurora
The document discusses the data flow task in SQL Server Integration Services (SSIS). It encapsulates the data flow engine and performs ETL processes like extract, transform, and load data. Data flow components include sources that extract data, transformations that modify data, and destinations that load data. Paths connect the components and create the data flow pipeline. Sources extract from different data sources. Transformations modify data through row-level and rowset operations. Destinations load data to various targets.
The document summarizes new features in .NET 3.5 SP1, including enhancements to ADO.NET Entity Framework, ADO.NET Data Services, ASP.NET routing, and ASP.NET dynamic data. It provides an overview and demonstrations of each technology. Key points covered include using Entity Framework to bridge the gap between object-oriented and relational models, consuming entity data models via LINQ queries or object services, and using data services to expose data over HTTP in a RESTful manner.
This document discusses various data flow transformations in SQL Server Integration Services (SSIS). It begins with an introduction to the different types of transformations, including row transformations and rowset transformations. It then provides examples and demonstrations of specific transformations like Character Map, Derived Column, Aggregate, Pivot, and Percentage Sampling. The document aims to explain how each transformation works and how it can be used to modify or aggregate data in an SSIS data flow.
The document discusses Microsoft Integration Services (SSIS), which is a platform for building data integration and transformation solutions. It describes the structure of SSIS packages, which contain connections, tasks, data flows, and other elements. It provides details on the control flow, which sequences tasks and containers, and the data flow, which moves data between sources and destinations while transforming the data. It lists the various tasks, transformations, and destinations that can be used in SSIS packages to integrate, cleanse, and load data.
The document discusses key components of Informatica PowerCenter/PowerMart Designer including the designer workspace, tools, source analyzer, warehouse designer, and common transformations. The source analyzer can import relational and flat file sources. The warehouse designer creates and manages target definitions. Common transformations include expression, filter, joiner, lookup, and aggregator which can perform calculations, filtering, joining data, lookups, and aggregations respectively.
The document discusses using datasets and data relations to retrieve and manipulate related data from multiple database tables. Key points include using datasets to transfer and manipulate data without an active connection, relating tables through data relations, retrieving related rows, and updating changes to the underlying database through data adapters. Stored procedures are also covered as an alternative to passing SQL statements across the network.
This document discusses how to implement operations like selection, joining, grouping, and sorting in Cassandra without SQL. It explains that Cassandra uses a nested data model to efficiently store and retrieve related data. Operations like selection can be performed by creating additional column families that index data by fields like birthdate and allow fast retrieval of records by those fields. Joining can be implemented by nesting related entity data within the same column family. Grouping and sorting are also achieved through additional indexing column families. While this requires duplicating data for different queries, it takes advantage of Cassandra's strengths in scalable updates.
This document discusses new features in SQL Server including MERGE statements, table valued parameters, grouping sets, and FILESTREAM storage. MERGE statements allow inserting, updating, and deleting data in one statement based on matching or non-matching rows between two tables. Table valued parameters allow passing tables of data as parameters to stored procedures. Grouping sets enable grouping data by multiple columns in a single query. FILESTREAM storage integrates the database engine with the file system to allow storing large binary objects on disk for improved performance.
This document provides an overview of NoSQL databases and MongoDB. It states that NoSQL databases are more scalable and flexible than relational databases. MongoDB is described as a cross-platform, document-oriented database that provides high performance, high availability, and easy scalability. MongoDB uses collections and documents to store data in a flexible, JSON-like format.
esProc is a software for data computing, query and integration within or between sql based database, data warehouse,hadoop, NoSql DB, local file, network file, excel or access. It is widely used in data migration, ETL tasks, complex event programming, big data, database parallel computing, hadoop and report development.
SQL can be used to query both streaming and batch data. Apache Flink and Apache Calcite enable SQL queries on streaming data. Flink uses its Table API and integrates with Calcite to translate SQL queries into dataflow programs. This allows standard SQL to be used for both traditional batch analytics on finite datasets and stream analytics producing continuous results from infinite data streams. Queries are executed continuously by applying operators within windows to subsets of streaming data.
This document summarizes new features in SQL Server 2008 for .NET developers, including spatial data support, BLOB storage using Filestream, enhancements to T-SQL, new date/time types, improved integration with Visual Studio, and business intelligence tools like SSAS, SSIS, and SSRS. It provides overviews of key concepts like spatial data types, using Filestream for BLOB storage, table-valued parameters, new date/time functionality, MERGE statements, shorthand notation in T-SQL, Entity Framework, SQL CLR, and Reporting Services.
This document summarizes new features in SQL Server 2008 for .NET developers, including spatial data support, BLOB storage using Filestream, enhancements to T-SQL, new date/time types, improved integration with Visual Studio, and business intelligence tools like Analysis Services, Integration Services, and Reporting Services.
This document summarizes new features in SQL Server 2008 for .NET developers, including spatial data support, BLOB storage using Filestream, enhancements to T-SQL, new date/time types, improved integration with Visual Studio, and business intelligence tools like Analysis Services, Integration Services, and Reporting Services.
Sun Trainings is one of the best coaching center in Hyderabad. Join our online training sessions with our real time faculty of Informatica. Practical sessions will also be provided for hands on experience. We provide training courses ideal for software and data management professionals. Our training sessions covers all information from basic to advanced level. Don’t wait anymore and mail your queries on [email protected] / (M) 9642434362 .
Sun Trainings is one of the best coaching center in Hyderabad. Join our online training sessions with our real time faculty of Informatica. Practical sessions will also be provided for hands on experience. We provide training courses ideal for software and data management professionals. Our training sessions covers all information from basic to advanced level. Don’t wait anymore and mail your queries on [email protected] / (M) 9642434362 .
Fabric Data Factory Pipeline Copy Perf Tips.pptxMark Kromer
This document provides performance tips for pipelines and copy activities in Azure Data Factory (ADF). It discusses:
- Using pipelines for data orchestration with conditional execution and parallel activities.
- The Copy activity provides massive-scale data movement within pipelines. Using Copy for ELT can land data quickly into a data lake.
- Gaining more throughput by using multiple parallel Copy activities but this can overload the source.
- Optimizing copy performance by using binary format, file lists/folders instead of individual files, and SQL source partitioning.
- Metrics showing copying Parquet files to a lakehouse at 5.1 GB/s while CSV and SQL loads were slower due to transformation.
The
Build data quality rules and data cleansing into your data pipelinesMark Kromer
This document provides guidance on building data quality rules and data cleansing into data pipelines. It discusses considerations for data quality in data warehouse and data science scenarios, including verifying data types and lengths, handling null values, domain value constraints, and reference data lookups. It also provides examples of techniques for replacing values, splitting data based on values, data profiling, pattern matching, enumerations/lookups, de-duplicating data, fuzzy joins, validating metadata rules, and using assertions.
Data cleansing and prep with synapse data flowsMark Kromer
This document provides resources for data cleansing and preparation using Azure Synapse Analytics Data Flows. It includes links to videos, documentation, and a slide deck that explain how to use Data Flows for tasks like deduplicating null values, saving data profiler summary statistics, and using metadata functions. A GitHub link shares a tutorial document for a hands-on learning experience with Synapse Data Flows.
Data cleansing and data prep with synapse data flowsMark Kromer
This document contains links to resources about using Azure Synapse Analytics for data cleansing and preparation with Data Flows. It includes links to videos and documentation about removing null values, saving data profiler summary statistics, and using metadata functions in Azure Data Factory data flows.
Mapping Data Flows Perf Tuning April 2021Mark Kromer
This document discusses optimizing performance for data flows in Azure Data Factory. It provides sample timing results for various scenarios and recommends settings to improve performance. Some best practices include using memory optimized Azure integration runtimes, maintaining current partitioning, scaling virtual cores, and optimizing transformations and sources/sinks. The document also covers monitoring flows to identify bottlenecks and global settings that affect performance.
This document discusses using Azure Data Factory (ADF) for data lake ETL processes in the cloud. It describes how ADF can ingest data from on-premises, cloud, and SaaS sources into a data lake for preparation, transformation, enrichment, and serving to downstream analytics or machine learning processes. The document also provides several links to YouTube videos and articles about using ADF for these tasks.
Azure Data Factory Data Wrangling with Power QueryMark Kromer
Azure Data Factory now allows users to perform data wrangling tasks through Power Query activities, translating M scripts into ADF data flow scripts executed on Apache Spark. This enables code-free data exploration, preparation, and operationalization of Power Query workflows within ADF pipelines. Examples of use cases include data engineers building ETL processes or analysts operationalizing existing queries to prepare data for modeling, with the goal of providing a data-first approach to building data flows and pipelines in ADF.
Azure Data Factory Data Flow Performance Tuning 101Mark Kromer
The document provides performance timing results and recommendations for optimizing Azure Data Factory data flows. Sample 1 processed a 421MB file with 887k rows in 4 minutes using default partitioning on an 80-core Azure IR. Sample 2 processed a table with the same size and transforms in 3 minutes using source and derived column partitioning. Sample 3 processed the same size file in 2 minutes with default partitioning. The document recommends partitioning strategies, using memory optimized clusters, and scaling cores to improve performance.
Data quality patterns in the cloud with ADFMark Kromer
Azure Data Factory can be used to build modern data warehouse patterns with Azure SQL Data Warehouse. It allows extracting and transforming relational data from databases and loading it into Azure SQL Data Warehouse tables optimized for analytics. Data flows in Azure Data Factory can also clean and join disparate data from Azure Storage, Data Lake Store, and other data sources for loading into the data warehouse. This provides simple and productive ETL capabilities in the cloud at any scale.
Azure Data Factory Data Flows Training v005Mark Kromer
Mapping Data Flow is a new feature of Azure Data Factory that allows building data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine for processing big data with unstructured requirements. Mapping Data Flows can be authored and designed visually, with transformations, expressions, and results previews, and then operationalized with Data Factory scheduling, monitoring, and control flow.
Mapping Data Flow is a new feature of Azure Data Factory that allows users to build data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine for processing big data with unstructured requirements. Mapping Data Flows can be operationalized with Data Factory's scheduling, control flow, and monitoring capabilities.
ADF Mapping Data Flows Training Slides V1Mark Kromer
Mapping Data Flow is a new feature of Azure Data Factory that allows users to build data transformations in a visual interface without code. It provides a serverless, scale-out transformation engine to transform data at scale in the cloud in a resilient manner for big data scenarios involving unstructured data. Mapping Data Flows can be operationalized with Azure Data Factory's scheduling, control flow, and monitoring capabilities.
SQL Saturday Redmond 2019 ETL Patterns in the CloudMark Kromer
This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, scaling ETL in the cloud, handling flexible schemas, and using ADF for orchestration. Key points include staging data in low-cost storage before processing, using ADF's integration runtime to process data both on-premises and in the cloud, and building resilient data flows that can handle schema drift.
Azure Data Factory Data Flow Limited Preview for January 2019Mark Kromer
Azure Data Factory introduces Visual Data Flow, a limited preview feature that allows users to visually design data flows without writing code. It provides a drag-and-drop interface for users to select data sources, place transformations on imported data, and choose destinations for transformed data. The flows are run on Azure and default to using Azure Data Lake Storage for staging transformed data, though users can optionally configure other staging options. The feature supports common data formats and transformations like sorting, merging, joining, and lookups.
Microsoft Azure Data Factory Data Flow ScenariosMark Kromer
Visual Data Flow in Azure Data Factory provides a limited preview of data flows that allow users to visually design transformations on data. It features implicit staging of data in data lakes, explicit selection of data sources and transformations through a toolbox interface, and setting of properties for transformation steps and destination connectors. The preview is intended to get early feedback to help shape the future of visual data flows in Azure Data Factory.
Azure Data Factory Data Flow Preview December 2019Mark Kromer
Visual Data Flow in Azure Data Factory provides a limited preview of data flows that allow users to visually design transformations on data. It features implicit staging of data in data lakes, explicit selection of data sources and transformations through a toolbox interface, and setting of properties for transformation steps and destination connectors. The preview is intended to get early feedback to help shape the future of Visual Data Flow.
Azure Data Factory for Redmond SQL PASS UG Sept 2018Mark Kromer
Azure Data Factory is a fully managed data integration service in the cloud. It provides a graphical user interface for building data pipelines without coding. Pipelines can orchestrate data movement and transformations across hybrid and multi-cloud environments. Azure Data Factory supports incremental loading, on-demand Spark, and lifting SQL Server Integration Services packages to the cloud.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtLynda Kane
Slide Deck from Buckeye Dreamin' 2024 presentation Assessing and Resolving Technical Debt. Focused on identifying technical debt in Salesforce and working towards resolving it.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...Fwdays
Why the "more leads, more sales" approach is not a silver bullet for a company.
Common symptoms of an ineffective Client Partnership (CP).
Key reasons why CP fails.
Step-by-step roadmap for building this function (processes, roles, metrics).
Business outcomes of CP implementation based on examples of companies sized 50-500.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
2. What are mapping data flows?
Code-free data transformation at scale
Serverless, scaled-out, ADF/Synapse-
managed Apache Spark™ engine
Resilient flows handle structured and
unstructured data
Operationalized as a data pipeline activity
3. Code-free data transformation at scale
Intuitive UX lets you focus on building transformation logic
Data cleansing
Data validation
Data aggregation
No requirement of knowing Spark, cluster management, Scala, Python, etc
vs
5. INGEST
Modern Data Warehouse (MDW)
PREPARE TRANSFORM,
PREDICT
& ENRICH
SERVE
STORE
VISUALIZE
On-premises data
Cloud data
SaaS data
Data Pipeline Orchestration & Monitoring
14. Building transformation logic
Transformations: A ‘step’ in the data flow
Engine intelligently groups them at runtime
19 currently available
Core logic of data flow
Add/Remove/Alter Columns
Join or lookup data from datasets
Change number or order of rows
Aggregate data
Hierarchal to relational
15. Source transformation
Define the data read by your
data flow
Import projection vs generic
Schema drift
Connector specific properties and
optimizations
Min: 1, Max: ∞
Define in-line or use dataset
16. Source: In-line vs dataset
Define all source properties within a data flow or use a separate
entity to store them
Dataset:
Reusable in other ADF activities such as Copy
Not based in Spark -> some settings overridden
In-line
Useful when using flexible schemas, one-off source instances or parameterized sources
Do not need “dummy” dataset object
Based in Spark, properties native to data flow
Most connectors only in available in one
17. Supported connectors
File-based data stores (ADLS Gen1/Gen2, Azure Blob Storage)
Parquet, JSON, DelimitedText, Excel, Avro, XML
In-line only: Common Data Model, Delta Lake
SQL tables
Azure SQL Database
Azure Synapse Analytics (formerly SQL DW)
Cosmos DB
Coming soon: Snowflake
If not supported, ingest to staging area via Copy activity
90+ connectors supported natively
18. Duplicating data streams
Duplicate data stream from any
stage of your data flow
Select ‘New branch’
Operate on same data with
different transformation
requirements
Self-joins
Writing to different sinks
Aggregating in one branch
19. Joining two data streams together
Use Join transformation to append columns from incoming stream to
any stream in your data flow
Join types: full outer, inner, left outer, right outer, cross
SQL Join equivalent
Match on computed columns or use non-equality conditions
Broadcast small data streams to cache data and improve
performance
20. Lookup transformation
Similar to left outer join, but with more functionality
All incoming rows are passed through regardless of match
Matching conditions same as a join
Multi or single row lookup
Match on all, first, last, or any row that meets join conditions
isMatch() function can be used in downstream transformations to
verify output
21. Exists transformation
Check for existence of a value in another stream
SQL Exists equivalent
See if any row matches in a subquery, just like SQL
Filter based on join matching conditions
Choose Exist or Not Exist for your filter conditions
Can specify a custom expressoin
22. Union transformation
Combine rows from multiple
streams
Add as many streams as
needed
Combine data based upon
column name or ordinal
column position
Use cases:
Similar data from different connection
that undergo same transformations
Writing multiple data streams into the
same sink
23. Conditional split
Split data into separate streams
based upon conditions
Use data flow expression language to
evaluate boolean
Use cases:
Sinking subset of data to different
locations
Perform different calculations on data
depending on a set of values
24. Derived column
Transform data at row and column level using expression language
Generate new or modify existing columns
Build expressions using the expression builder
Handle structured or unstructured data
Use column patterns to match on rules and regular expressions
Can be used to transform multiple columns in bulk
Most heavily used transformation
25. Select transformation
Metadata and column maintenance
SQL Select statement
Alias or renames data stream and columns
Prune unwanted or duplicate columns
Common after joins and lookups
Rule-based mapping for flexible schemas, bulk mapping
Map hierarchal columns to flat structure
26. Surrogate key transformation
Generate incrementing key to use as a non-business key in your data
To seed the starting value of your surrogate key, use derived column
and a lookup from an existing table
Examples are in documentation
Useful for generating keys for star schema dimension tables
27. Aggregate transformation
Aggregate data into groups using aggregate function
Like SQL GROUP BY clause in a Select statement
Aggregate functions include sum(), max(), avg(), first(), collect()
Choose columns to group by
One row for each unique group by column value
Only columns used in transformation are in output data stream
Use self-join to append to existing data
Supports pattern matching
28. Pivot and unpivot transformations
Pivot row values into new columns and vice-versa
Both are aggregate transformations that require aggregate functions
If pivot key values not specified, all columns become drifted
Use map drifted quick action to add to schema quickly
29. Window transformation
Aggregate data across
“windows” of data partitions
Used to compare a row of data against
others in its ‘group’
Group determined by group by
columns, sorting conditions
and range bounds
Used for ranking rows in a
group and getting lead/lag
Sorting causes reshuffling of
data
“Expensive” operation
31. Alter row transformation
Mark rows as Insert, Update, Delete, or Upsert
Like SQL MERGE statement
Insert by default
Define policies to update your database
Works with SQL DB, Synapse, Cosmos DB, and Delta Lake
Specify allowed update methods in each sink
32. Flatten transformation
Unroll array values into individual rows
One row per value
Used to convert hierarchies to flat structures
Opposite of collect() aggregate function
33. Rank transformation
Rank data across an entire dataset
Same as Rank() function in Window transformation,but
scales better for a ranking an entire dataset
For ranking of data partitions, use Window rank()
For ranking entire dataset, use Rank transformation
34. Parse transformation
Parses string data from columns that are formatted
text
Currently supported formats: JSON, text delimited
Ex: Turn plain text JSON strings from a source file in
a single column into formatted JSON output
35. Sink transformation
Define the properties for landing your data in your destination
target data store
Define using dataset or in-line
Can map columns similar to select transformation
Import schema definition from destination
Set actions on destinations
Truncate table or clear folder, SQL pre/post actions, database update methods
Choose how the written data is partitioned
Use current partitioning is almost always fastest
Note: Writing to single file can be very slow with large amounts
of data
37. Visual expression builder
List of columns
being modified
All available
functions, fields,
parameters …
Build expressions
here with full
auto-complete
and syntax
checking
View results of your
expression in the data
preview pane with
live, interactive
results
38. Expression language
Expressions are built using the data flow expression language
Expressions can reference:
Built-in expression functions
Defined input schema columns
Data flow parameters
Literals
Certain transformations have unique functions
Count(), sum() in Aggregate, denseRank() in Window, etc
Evaluates to spark data types
100s of transformation, analytical, array, metadata, string, and
mathematical functions available
39. Debug mode
Quickly verify logic during development on small interactive cluster
4 core, 60-minute time to live
Enables the following:
Get data preview snapshot at each transformation
Preview output of expression in expression builder
Run debug pipeline with no spin up
Import Spark projection of source schema
Rule of thumb: If developing Data Flows, turn on right away
Initial 3-5-minute start up time
43. Parameterizing data flows
Both dataset properties and data-flow expressions can be
parameterized
Passed in via data flow activity
Can use data flow or pipeline expression language
Expressions can reference $parameterName
Can be literal values or column references
46. Schema drift
In real-world data integration solutions, source/target data stores
change shape
Source data fields can change names
Number of columns can change over time
Traditional ETL processes break when schemas drift
Mapping data flow has built-in handling for flexible schemas
Patterns, rule-based mappings, byName(s) function, etc
Source: Read additional columns on top of what is defined in the source schema
Sink: Write additional columns on top of what is defined in the sink schema
50. Data flow activity
Run as activity in pipeline
Integrated with existing ADF control flow, scheduling, orchestration, montoring, C/ICD
Choose which integration runtime (IR) to run on
# of cores, compute type, cluster time to live
Assign parameters
51. Data flow integration runtime
Integrated with existing Azure IR
Choose compute type, # of cores, time to live
Time to live: time a cluster is alive after last execution concludes
Minimal start up time for sequential data flows
Parameterize compute type, # of cores if using Auto Resolve
53. Data flow security considerations
All data stays inside VMs that run the Databricks cluster which are
spun up JIT for each job
• Azure Databricks attaches storage to the VMs for logging and spill-over from in-memory data frames
during job operation. These storage accounts are fully encrypted and within the Microsoft tenant.
• Each cluster is single-tenant and specific to your data and job. This cluster is not shared with any
other tenant
• Data flow processes are completely ephemeral. Once a job is completed,
all associated resources are destroyed
• Both cluster and storage account are deleted
• Data transfers in data flows are protected using certificates
• Active telemetry is logged and maintained for 45 days for troubleshooting
by the Azure Data Factory team
55. Best practices – Lifecycle
1. Test your transformation logic using debug mode and data
preview
Limited source size or use sample files
2. Test end-to-end pipeline logic using pipeline debug
Verify data is read/written correctly
Used as smoke test before merging your changes
3. Publish and trigger your pipelines within a Dev Factory
Test performance and cluster size
4. Promote pipelines to higher environments such as UAT and PROD
using CI/CD
Increase size and scope of data as you get to higher environments
56. Best practices – Debug (Data Preview)
Data Preview
Data preview is inside the data flow designer transformation properties
Uses row limits and sampling techniques to preview data from a small size of data
Allows you to build and validate units of logic with samples of data in real time
You have control over the size of the data limits under Debug Settings
If you wish to test with larger datasets, set a larger compute size in the Azure IR when
switching on “Debug Mode”
Data Preview is only a snapshot of data in memory from Spark data frames. This feature does
not write any data, so the sink drivers are not utilized and not tested in this mode.
57. Best practices – Debug (Pipeline Debug)
Pipeline Debug
Click debug button to test your data flow inside of a pipeline
Default debug limits the execution runtime so you will want to limit data sizes
Sampling can be applied here as well by using the “Enable Sampling” option in each Source
Use the debug button option of “use activity IR” when you wish to use a job execution
compute environment
This option is good for debugging with larger datasets. It will not have the same execution timeout limit as the
default debug setting
58. Optimizing data flows
Transformation order generally does not matter
Data flows have a Spark optimizer that reorders logic to perform as best as it can
Repartitioning and reshuffling data negates optimizer
Each transformation has ‘Optimize’ tab to control partitioning
strategies
Generally do not need to alter
Altering cluster size and type has performance impact
Four components
1. Cluster startup time
2. Reading from sources
3. Transformation time
4. Writing to sinks
59. Identifying bottlenecks
1. Cluster startup time
2. Sink processing time
3. Source read time
4. Transformation stage time
1. Sequential executions can
lower the cluster startup time
by setting a TTL in Azure IR
2. Total time to process the
stream from source to sink.
There is also a post-processing
time when you click on the Sink
that will show you how much
time Spark had to spend with
partition and job clean-up.
Write to single file and slow
database connections will
increase this time
3. Shows you how long it took to
read data from source.
Optimize with different source
partition strategies
4. This will show you bottlenecks
in your transformation logic.
With larger general purpose
and mem optimized IRs, most
of these operations occur in
memory in data frames and are
usually the fastest operations
in your data flow
60. Best practices - Sources
When reading from file-based sources, data flow automatically
partitions the data based on size
~128 MB per partition, evenly distributed
Use current partitioning will be fastest for file-based and Synapse using PolyBase
Enable staging for Synapse
For Azure SQL DB, use Source partitioning on column with high
cardinality
Improves performance, but can saturate your source database
Reading can be limited by the I/O of your source
61. Optimizing transformations
Each transformation has its own optimize tab
Generally better to not alter -> reshuffling is a relatively slow process
Reshuffling can be useful if data is very skewed
One node has a disproportionate amount of data
For Joins, Exists and Lookups:
If you have a lot, memory optimized greatly increases performance
Can ‘Broadcast’ if the data on one side is small
Rule of thumb: Less than 50k rows
Increasing integration runtime can speed up transformations
Transformations that require reshuffling like Sort negatively impact
performance
62. Best practices - Sinks
SQL:
Disable indexes on target with pre/post SQL scripts
Increase SQL capacity during pipeline execution
Enable staging when using Synapse
File-based sinks:
Use current partitioning allows Spark to create output
Output to single file is a very slow operation
Combines data into single partition
Often unnecessary by whoever is consuming data
Can set naming patterns or use data in column
Any reshuffling of data is slow
Cosmos DB
Set throughput and batch size to meet performance requirements
63. Azure Integration Runtime
Data Flows use JIT compute to minimize running expensive clusters
when they are mostly idle
Generally more economical, but each cluster takes ~4 minutes to spin up
IR specifies what cluster type and core-count to use
Memory optimized is best, compute optimized doesn’t generally work for production workloads
When running Sequential jobs utilize Time to Live to reuse cluster
between executions
Keeps cluster alive for TTL minutes after execution for new job to use
Maximum one job per cluster
Rule of thumb: start small and scale up
65. Data flow script (DFS)
DFS defines the logical intent of your data transformations
Script is bundled and marshalled to Spark cluster as a job for
execution
DFS can be auto-generated and used for programmatic creation of
data flows
Access script behind UI via “Script” button
Click “Copy as Single Line” to save version of script that is ready for
JSON
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-
script
69. ETL Tool Migration Overview
Migrating from an existing large enterprise ETL installation to ADF and data flows requires
adherence to a formal methodology that incorporates classic SDLC, change management,
project management, and a deep understanding of your current data estate and ETL
requirements.
Successful migration projects require project plans, executive sponsorship, budget, and a
dedicated team to focus on rebuilding the ETL in ADF.
For existing on-prem ETL estates, it is very important to learn basics of Cloud, Azure, and ADF
generally before taking this Data Flows training.
72. Training
• On-prem to Cloud, Azure general training, ADF general training, Data Flows training
• A general understanding of the different between legacy client/server on-prem ETL
architectures and cloud-based Big Data processing is required
• ADF and Data Flows execute on Spark, so learn the fundamentals of the different between
row-by-row processing on a local server and batch/distributed computing on Spark in the
Cloud
73. Execution
• Start with the top 10 mission-critical ETL mappings and list out the primary logical goals and
steps achieved in each
• Use sample data and debug each scenario as new pipelines and data flows in ADF
• UAT each of those 10 mappings in ADF using sample data
• Lay out end-to-end project plan for remaining mapping migrations
• Plan the remainder of the project into quarterly calendar milestones
• Except each phase to take around 3 months
• Majority of large existing ETL infrastructure modernization migrations take 12-18 months to
complete
74. ETL System Integrator Partners
Bitwise Global
https://www.bitwiseglobal.com/webinars/automated-etl-conversion-to-adf-for-accelerated-data-warehouse-
migration-on-azure/
Next Pathway
https://blog.nextpathway.com/next-pathway-adds-ground-breaking-capability-to-translate-
informatica-and-datastage-etl-pipelines-to-the-cloud-in-latest-version-of-shift