During the second half of 2016, IBM built a state of the art Hadoop cluster with the aim of running massive scale workloads. The amount of data available to derive insights continues to grow exponentially in this increasingly connected era, resulting in larger and larger data lakes year after year. SQL remains one of the most commonly used languages used to perform such analysis, but how do today’s SQL-over-Hadoop engines stack up to real BIG data? To find out, we decided to run a derivative of the popular TPC-DS benchmark using a 100 TB dataset, which stresses both the performance and SQL support of data warehousing solutions! Over the course of the project, we encountered a number of challenges such as poor query execution plans, uneven distribution of work, out of memory errors, and more. Join this session to learn how we tackled such challenges and the type of tuning that was required to the various layers in the Hadoop stack (including HDFS, YARN, and Spark) to run SQL-on-Hadoop engines such as Spark SQL 2.0 and IBM Big SQL at scale!
Speaker
Simon Harris, Cognitive Analytics, IBM Research
Microsoft Technologies for Data Science 201612Mark Tabladillo
The document discusses Microsoft technologies that can be used for data science, including SQL Server, Azure ML, Cortana Intelligence Suite, and R Server. It provides definitions of key terms like data science, machine learning, and data mining. It also shares links to resources for learning about Microsoft's data science tools and platforms.
Modern big data and machine learning in the era of cloud, docker and kubernetesSlim Baltagi
There is a major shift in web and mobile application architecture from the ‘old-school’ one to a modern ‘micro-services’ architecture based on containers. Kubernetes has been quite successful in managing those containers and running them in distributed computing environments.
Now enabling Big Data and Machine Learning on Kubernetes will allow IT organizations to standardize on the same Kubernetes infrastructure. This will propel adoption and reduce costs.
Kubeflow is an open source framework dedicated to making it easy to use the machine learning tool of your choice and deploy your ML applications at scale on Kubernetes. Kubeflow is becoming an industry standard as well!
Both Kubernetes and Kubeflow will enable IT organizations to focus more effort on applications rather than infrastructure.
Exploring microservices in a Microsoft landscapeAlex Thissen
Presentation for Dutch Microsoft TechDays 2015 with Marcel de Vries:
During this session we will take a look at how to realize a Microservices architecture (MSA) using the latest Microsoft technologies available. We will discuss some fundamental theories behind MSA and show you how this can actually be realized with Microsoft technologies such as Azure Service Fabric. This session is a real must-see for any developer that wants to stay ahead of the curve in modern architectures.
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
Watch the on-demand recording here:
https://event.on24.com/wcc/r/1632072/803744C924E8BFD688BD117C6B4B949B
Evolution of Big Data and the Role of Analytics | Hybrid Data Management
IBM, Driving the future Hybrid Data Warehouse with IBM Integrated Analytics System.
Machine learning services with SQL Server 2017Mark Tabladillo
SQL Server 2017 introduces Machine Learning Services with two independent technologies: R and Python. The purpose of this presentation is 1) to describe major features of this technology for technology managers; 2) to outline use cases for architects; and 3) to provide demos for developers and data scientists.
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Rui Quintino
The document discusses machine learning with SQL Server 2016 and R Services. It provides an overview of machine learning, R programming language, and the challenges of using R with SQL databases prior to SQL Server 2016. SQL Server 2016 introduces R Services, which allows running R code directly in the database for high performance, scalable machine learning. R Services integrates R with SQL Server through in-database deployment and parallel processing capabilities. This eliminates data movement and scaling issues while leveraging existing R and SQL skills.
Automating the Enterprise with CloudForms & AnsibleJerome Marc
Automating the Enterprise with CloudForms & Ansible:
- Self-service IT requests and automated delivery of IT services.
- Automated configuration and policy enforcement of deployed systems.
- Operational visibility and control.
IBM Cloud Pak for Data is a unified platform that simplifies data collection, organization, and analysis through an integrated cloud-native architecture. It allows enterprises to turn data into insights by unifying various data sources and providing a catalog of microservices for additional functionality. The platform addresses challenges organizations face in leveraging data due to legacy systems, regulatory constraints, and time spent preparing data. It provides a single interface for data teams to collaborate and access over 45 integrated services to more efficiently gain insights from data.
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
Let's be honest - there are some pretty amazing capabilities locked in proprietary SQL engines which have had decades of R&D baked into them. At this session, learn how IBM, working with the Apache community, has unlocked the value of their SQL optimizer for Hive, HBase, ObjectStore, and Spark - helping customers avoid lock-in while providing best performance, concurrency and scalability for complex, analytical SQL workloads. You'll also learn how the SQL engine was extended and integrated with Ambari, Ranger, YARN/Slider and HBase. We share the results of this project which has enabled running all 99 TPC-DS queries at world record breaking 100TB scale factor.
This document summarizes Cisco's Partner Summit 2017, focusing on enabling a multicloud world. It introduces Cisco's new multicloud portfolio and offerings to help partners design, migrate, manage, and secure customer workloads across public and private clouds. Key speakers discuss opportunities in multicloud consulting, managed services, and software integration. Cisco and Google announce an open hybrid cloud solution integrating Google Cloud Platform with Cisco infrastructure software.
This document provides an overview of Microsoft's hybrid cloud and platform strategy. It discusses Microsoft's public cloud offerings through Azure and Azure Stack, as well as hybrid capabilities like StorSimple for storage, SQL Server 2016 StretchDB for databases, and ServiceBus for app integration. It positions Microsoft as uniquely able to enable customers' hybrid cloud strategies through its comprehensive set of hybrid products and technologies.
Implementing Security on a Large Multi-Tenant Cluster the Right WayDataWorks Summit
Raise your hands if you are deploying Kerberos and other Hadoop security components after deploying Hadoop to the enterprise. We will present the best practices and challenges of implementing security on a large multi-tenant Hadoop cluster spanning multiple data centers. Additionally, we will outline our authentication & authorization security architecture, how we reduced complexity through planning, and how we worked with multiple teams and organizations to implement security the right way the first time. We will share lessons learned and takeaways for implementing security at your company.
We will walk through the implementation and its impacts to the user, development, support and security communities and will highlight the pitfalls that we navigated to achieve success. Protecting your customers and information assets is critical to success. If you are planning to introduce Hadoop security to your ecosystem, don’t miss this in depth discussion on a very important and necessary component to enterprise big data.
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
The annual review session by the AMIS team on their findings, interpretations and opinions regarding news, trends, announcements and roadmaps around Oracle's product portfolio. This presentation discusses architecture trends, container technology, disruptive movements such as IoT, Blockchain, Intelligent Bots and Machine Learning, Modern User Experience, Enterprise Integration, Autonomous Systems in general and Autonomous Database in particular, Security, Cloud, Networking, Java, High PaaS & Low PaaS, DevOps, Microservices, Hybrid Cloud. This Oracle OpenWorld - more than any in recent history - rocked the foundations of the Oracle platform and opened up some real new roads ahead. This presentation leads you through the most relevant announcements and new directions.
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
Lance Olson. Cortana Analytics is a fully managed big data and advanced analytics suite that helps you transform your data into intelligent action. Come to this two-part session to learn how you can do "big data" processing and storage in Cortana Analytics. In the first part, we will provide an overview of the processing and storage services. We will then talk about the patterns and use cases which make up most big data solutions. In the second part, we will go hands-on, showing you how to get started today with writing batch/interactive queries, real-time stream processing, or NoSQL transactions all over the same repository of data. Crunch petabytes of data by scaling out your computation power to any sized cluster. Store any amount of unstructured data in its native format with no limits to file or account size. All of this can be done with no hardware to acquire or maintain and minimal time to setup giving you the value of "big data" within minutes. Go to https://channel9.msdn.com/ to find the recording of this session.
Create B2B Exchanges with Cisco Connected Processes: an overviewCisco DevNet
A session in the DevNet Zone at Cisco Live, Berlin. The opportunity cost of business disruptions in the hyper-connected world can be very high. To ensure business continuity and optimization, organizations are automating many critical workflows and infrastructure operations throughout their enterprise and extended ecosystems. Cisco Connected Processes software enable architects, application developers and integration professionals to deliver business processes and automation as a service, while managing workflows and data more efficiently and effectively. Join this session to learn how scalable operational efficiencies can save you time and money while simplifying collaboration between all the members of your technical community.
Securing your Big Data Environments in the CloudDataWorks Summit
Big Data tools are becoming a critical part of enterprise architectures and as such securing the data, at rest, and in motion is a necessity. More so, when you’re implementing these solutions in the cloud and the data doesn't reside within the confines of your trusted data center. Also, there is a fine balance between implementing enterprise-grade security and negotiating utmost performance given the overheads of encryption and/or identity management.
This session is designed to tackle these challenges head on and explain the various options available in the cloud. The focal points are the implementation of tools like Ranger and Knox for cloud deployments, but we also pay attention to the security features offered in the cloud that complement this process and secure the data in unprecedented ways.
Cloud Security + OSS Security tools are a deadly combination, when it comes to securing your Data Lake.
Still on IBM BigInsights? We have the right path for youModusOptimum
In 2017, IBM and Hortonworks formed a strategic partnership to deliver data science and machine learning capabilities to customers, releasing a combined solution that includes the #1 open source platform for Hadoop—Hortonworks Data Platform.
Now, the end of support date for IBM BigInsights is fast approaching, on 30 June 2019. In this webinar, we will explain why now is the time to make the move from IBM BigInsights, and how IBM & Hortonworks will support you as you do.
Oracle database in cloud, dr in cloud and overview of oracle database 18cAiougVizagChapter
This document provides a profile summary of Malay Kumar Khawas, a Principal Consultant at Oracle India. It outlines his professional experience including over 12 years working with Oracle technologies. It also lists his areas of expertise, which include Oracle Database, Cloud implementations, identity management, disaster recovery, and various Oracle products. The document then provides an agenda for a presentation on Oracle Database Cloud Services, disaster recovery in Oracle Public Cloud, and new features in Oracle Database 18c.
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."Gustavo Cuervo
1. IBM and Red Hat have a long partnership in open source technologies spanning over 20 years. They are leaders in hybrid cloud, multi-cloud, open source, security, and data solutions.
2. Together, IBM and Red Hat provide solutions across traditional, private and public cloud environments to help clients create cloud-native applications and drive portability across clouds.
3. IBM and Red Hat share common beliefs around innovation, containers, and the importance of open standards and hybrid/multi-cloud environments. Their partnership provides enterprises flexibility and choice in their cloud journeys.
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
Today’s Big Data teams demand solutions designed for Big Data that are optimized, secure, and adaptable to changing workload requirements. Working together, Hortonworks, IBM, and Attunity have designed an integrated solution that transfers large volumes of data to a platform that can handle rapid ingest, processing and analysis of data of all types from all sources, at scale.
https://hortonworks.com/webinar/benefits-transferring-real-time-data-hadoop-scale-ibm-hortonworks-attunity/
This document summarizes new features in SQL Server 2019 including intelligent query processing, data classification and auditing, accelerated database recovery, data virtualization, SQL Server replication in one command, additional capabilities and migration tools, and a modern platform with Linux, containers, and machine learning services. It provides examples of how these features can help solve modern data challenges and gain performance without changing applications.
Automating the Enterprise with CloudForms & AnsibleJerome Marc
Automating the Enterprise with CloudForms & Ansible:
- Self-service IT requests and automated delivery of IT services.
- Automated configuration and policy enforcement of deployed systems.
- Operational visibility and control.
IBM Cloud Pak for Data is a unified platform that simplifies data collection, organization, and analysis through an integrated cloud-native architecture. It allows enterprises to turn data into insights by unifying various data sources and providing a catalog of microservices for additional functionality. The platform addresses challenges organizations face in leveraging data due to legacy systems, regulatory constraints, and time spent preparing data. It provides a single interface for data teams to collaborate and access over 45 integrated services to more efficiently gain insights from data.
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
Let's be honest - there are some pretty amazing capabilities locked in proprietary SQL engines which have had decades of R&D baked into them. At this session, learn how IBM, working with the Apache community, has unlocked the value of their SQL optimizer for Hive, HBase, ObjectStore, and Spark - helping customers avoid lock-in while providing best performance, concurrency and scalability for complex, analytical SQL workloads. You'll also learn how the SQL engine was extended and integrated with Ambari, Ranger, YARN/Slider and HBase. We share the results of this project which has enabled running all 99 TPC-DS queries at world record breaking 100TB scale factor.
This document summarizes Cisco's Partner Summit 2017, focusing on enabling a multicloud world. It introduces Cisco's new multicloud portfolio and offerings to help partners design, migrate, manage, and secure customer workloads across public and private clouds. Key speakers discuss opportunities in multicloud consulting, managed services, and software integration. Cisco and Google announce an open hybrid cloud solution integrating Google Cloud Platform with Cisco infrastructure software.
This document provides an overview of Microsoft's hybrid cloud and platform strategy. It discusses Microsoft's public cloud offerings through Azure and Azure Stack, as well as hybrid capabilities like StorSimple for storage, SQL Server 2016 StretchDB for databases, and ServiceBus for app integration. It positions Microsoft as uniquely able to enable customers' hybrid cloud strategies through its comprehensive set of hybrid products and technologies.
Implementing Security on a Large Multi-Tenant Cluster the Right WayDataWorks Summit
Raise your hands if you are deploying Kerberos and other Hadoop security components after deploying Hadoop to the enterprise. We will present the best practices and challenges of implementing security on a large multi-tenant Hadoop cluster spanning multiple data centers. Additionally, we will outline our authentication & authorization security architecture, how we reduced complexity through planning, and how we worked with multiple teams and organizations to implement security the right way the first time. We will share lessons learned and takeaways for implementing security at your company.
We will walk through the implementation and its impacts to the user, development, support and security communities and will highlight the pitfalls that we navigated to achieve success. Protecting your customers and information assets is critical to success. If you are planning to introduce Hadoop security to your ecosystem, don’t miss this in depth discussion on a very important and necessary component to enterprise big data.
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone.
Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."
The annual review session by the AMIS team on their findings, interpretations and opinions regarding news, trends, announcements and roadmaps around Oracle's product portfolio. This presentation discusses architecture trends, container technology, disruptive movements such as IoT, Blockchain, Intelligent Bots and Machine Learning, Modern User Experience, Enterprise Integration, Autonomous Systems in general and Autonomous Database in particular, Security, Cloud, Networking, Java, High PaaS & Low PaaS, DevOps, Microservices, Hybrid Cloud. This Oracle OpenWorld - more than any in recent history - rocked the foundations of the Oracle platform and opened up some real new roads ahead. This presentation leads you through the most relevant announcements and new directions.
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
Lance Olson. Cortana Analytics is a fully managed big data and advanced analytics suite that helps you transform your data into intelligent action. Come to this two-part session to learn how you can do "big data" processing and storage in Cortana Analytics. In the first part, we will provide an overview of the processing and storage services. We will then talk about the patterns and use cases which make up most big data solutions. In the second part, we will go hands-on, showing you how to get started today with writing batch/interactive queries, real-time stream processing, or NoSQL transactions all over the same repository of data. Crunch petabytes of data by scaling out your computation power to any sized cluster. Store any amount of unstructured data in its native format with no limits to file or account size. All of this can be done with no hardware to acquire or maintain and minimal time to setup giving you the value of "big data" within minutes. Go to https://channel9.msdn.com/ to find the recording of this session.
Create B2B Exchanges with Cisco Connected Processes: an overviewCisco DevNet
A session in the DevNet Zone at Cisco Live, Berlin. The opportunity cost of business disruptions in the hyper-connected world can be very high. To ensure business continuity and optimization, organizations are automating many critical workflows and infrastructure operations throughout their enterprise and extended ecosystems. Cisco Connected Processes software enable architects, application developers and integration professionals to deliver business processes and automation as a service, while managing workflows and data more efficiently and effectively. Join this session to learn how scalable operational efficiencies can save you time and money while simplifying collaboration between all the members of your technical community.
Securing your Big Data Environments in the CloudDataWorks Summit
Big Data tools are becoming a critical part of enterprise architectures and as such securing the data, at rest, and in motion is a necessity. More so, when you’re implementing these solutions in the cloud and the data doesn't reside within the confines of your trusted data center. Also, there is a fine balance between implementing enterprise-grade security and negotiating utmost performance given the overheads of encryption and/or identity management.
This session is designed to tackle these challenges head on and explain the various options available in the cloud. The focal points are the implementation of tools like Ranger and Knox for cloud deployments, but we also pay attention to the security features offered in the cloud that complement this process and secure the data in unprecedented ways.
Cloud Security + OSS Security tools are a deadly combination, when it comes to securing your Data Lake.
Still on IBM BigInsights? We have the right path for youModusOptimum
In 2017, IBM and Hortonworks formed a strategic partnership to deliver data science and machine learning capabilities to customers, releasing a combined solution that includes the #1 open source platform for Hadoop—Hortonworks Data Platform.
Now, the end of support date for IBM BigInsights is fast approaching, on 30 June 2019. In this webinar, we will explain why now is the time to make the move from IBM BigInsights, and how IBM & Hortonworks will support you as you do.
Oracle database in cloud, dr in cloud and overview of oracle database 18cAiougVizagChapter
This document provides a profile summary of Malay Kumar Khawas, a Principal Consultant at Oracle India. It outlines his professional experience including over 12 years working with Oracle technologies. It also lists his areas of expertise, which include Oracle Database, Cloud implementations, identity management, disaster recovery, and various Oracle products. The document then provides an agenda for a presentation on Oracle Database Cloud Services, disaster recovery in Oracle Public Cloud, and new features in Oracle Database 18c.
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
IBM + REDHAT "Creating the World's Leading Hybrid Cloud Provider..."Gustavo Cuervo
1. IBM and Red Hat have a long partnership in open source technologies spanning over 20 years. They are leaders in hybrid cloud, multi-cloud, open source, security, and data solutions.
2. Together, IBM and Red Hat provide solutions across traditional, private and public cloud environments to help clients create cloud-native applications and drive portability across clouds.
3. IBM and Red Hat share common beliefs around innovation, containers, and the importance of open standards and hybrid/multi-cloud environments. Their partnership provides enterprises flexibility and choice in their cloud journeys.
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
Today’s Big Data teams demand solutions designed for Big Data that are optimized, secure, and adaptable to changing workload requirements. Working together, Hortonworks, IBM, and Attunity have designed an integrated solution that transfers large volumes of data to a platform that can handle rapid ingest, processing and analysis of data of all types from all sources, at scale.
https://hortonworks.com/webinar/benefits-transferring-real-time-data-hadoop-scale-ibm-hortonworks-attunity/
This document summarizes new features in SQL Server 2019 including intelligent query processing, data classification and auditing, accelerated database recovery, data virtualization, SQL Server replication in one command, additional capabilities and migration tools, and a modern platform with Linux, containers, and machine learning services. It provides examples of how these features can help solve modern data challenges and gain performance without changing applications.
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
Find out how Hortonworks and IBM help you address these challenges to enable success to optimize your existing EDW environment.
https://hortonworks.com/webinar/modernize-existing-edw-ibm-big-sql-hortonworks-data-platform/
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"Lviv Startup Club
This document discusses MS SQL Server 2019's capabilities for big data processing through PolyBase and Big Data Clusters. PolyBase allows SQL queries to join data stored externally in sources like HDFS, Oracle and MongoDB. Big Data Clusters deploy SQL Server on Linux in Kubernetes containers with separate control, compute and storage planes to provide scalable analytics on large datasets. Examples of using these technologies include data virtualization across sources, building data lakes in HDFS, distributed data marts for analysis, and integrated AI/ML tasks on HDFS and SQL data.
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
Microsoft Azure Data Lake Storage is designed to enable operational and exploratory analytics through a hyper-scale repository. Journey through Azure Data Lake Storage Gen1 with Microsoft Data Platform Specialist, Audrey Hammonds. In this video she explains the fundamentals to Gen 1 and Gen 2, walks us through how to provision a Data Lake, and gives tips to avoid turning your Data Lake into a swamp.
Learn more about Data Lakes with our blog - Data Lakes: Data Agility is Here Now https://bit.ly/2NUX1H6
This document provides an overview of a course on implementing a modern data platform architecture using Azure services. The course objectives are to understand cloud and big data concepts, the role of Azure data services in a modern data platform, and how to implement a reference architecture using Azure data services. The course will provide an ARM template for a data platform solution that can address most data challenges.
This document provides an overview of 6 modules related to SQL Server workshops:
- Module 1 covers database design and architecture sessions
- Module 2 focuses on intelligent query processing, data classification/auditing, database recovery, data virtualization, and replication capabilities
- Module 3 discusses the big data landscape, including data growth drivers, common use cases, and scale-out processing approaches like Hadoop and Spark
The document provides an overview of the Databricks platform, which offers a unified environment for data engineering, analytics, and AI. It describes how Databricks addresses the complexity of managing data across siloed systems by providing a single "data lakehouse" platform where all data and analytics workloads can be run. Key features highlighted include Delta Lake for ACID transactions on data lakes, auto loader for streaming data ingestion, notebooks for interactive coding, and governance tools to securely share and catalog data and models.
SQL Saturday Redmond 2019 ETL Patterns in the CloudMark Kromer
This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, scaling ETL in the cloud, handling flexible schemas, and using ADF for orchestration. Key points include staging data in low-cost storage before processing, using ADF's integration runtime to process data both on-premises and in the cloud, and building resilient data flows that can handle schema drift.
The document outlines the roadmap for SQL Server, including enhancements to performance, security, availability, development tools, and big data capabilities. Key updates include improved intelligent query processing, confidential computing with secure enclaves, high availability options on Kubernetes, machine learning services, and tools in Azure Data Studio. The roadmap aims to make SQL Server the most secure, high performing, and intelligent data platform across on-premises, private cloud and public cloud environments.
Azure Data Factory ETL Patterns in the CloudMark Kromer
This document discusses ETL patterns in the cloud using Azure Data Factory. It covers topics like ETL vs ELT, the importance of scale and flexible schemas in cloud ETL, and how Azure Data Factory supports workflows, templates, and integration with on-premises and cloud data. It also provides examples of nightly ETL data flows, handling schema drift, loading dimensional models, and data science scenarios using Azure data services.
In this session, we explain how the new version of SQL Server will improve database operations, advance security and compliance and bring advanced analytics to all your data workloads.
Prague data management meetup 2018-03-27Martin Bém
This document discusses different data types and data models. It begins by describing unstructured, semi-structured, and structured data. It then discusses relational and non-relational data models. The document notes that big data can include any of these data types and models. It provides an overview of Microsoft's data management and analytics platform and tools for working with structured, semi-structured, and unstructured data at varying scales. These include offerings like SQL Server, Azure SQL Database, Azure Data Lake Store, Azure Data Lake Analytics, HDInsight and Azure Data Warehouse.
The document discusses modernizing a data warehouse using the Microsoft Analytics Platform System (APS). APS is described as a turnkey appliance that allows organizations to integrate relational and non-relational data in a single system for enterprise-ready querying and business intelligence. It provides a scalable solution for growing data volumes and types that removes limitations of traditional data warehousing approaches.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Microsoft Data Platform - What's includedJames Serra
This document provides an overview of a speaker and their upcoming presentation on Microsoft's data platform. The speaker is a 30-year IT veteran who has worked in various roles including BI architect, developer, and consultant. Their presentation will cover collecting and managing data, transforming and analyzing data, and visualizing and making decisions from data. It will also discuss Microsoft's various product offerings for data warehousing and big data solutions.
The document discusses Azure Data Factory and its capabilities for cloud-first data integration and transformation. ADF allows orchestrating data movement and transforming data at scale across hybrid and multi-cloud environments using a visual, code-free interface. It provides serverless scalability without infrastructure to manage along with capabilities for lifting and running SQL Server Integration Services packages in Azure.
This session focuses on the needs of the data integrator and data engineer whether that be for data warehousing & BI, advanced analytics of data for SaaS applications. We walk through a comprehensive set of new additions to Azure Data Factory and SSIS for moving and integrating data across on-premises and cloud. Topics and examples will include simple, scalable and reliable data pipelines in ADF using a serverless, parallel data movement service to/from the cloud, provisioning of Azure-SSIS Integration Runtime (IR) – dedicated servers for lifting & shifting SSIS packages to cloud– and its customization with your own/3rd party extensions, the execution of SSIS packages as first-class activities in ADF pipelines and their combination with other ADF activities to create modern ETL/ELT workflows all through the new code-free experience.
PASS Summit - SQL Server 2017 Deep DiveTravis Wright
Deep dive into SQL Server 2017 covering SQL Server on Linux, containers, HA improvements, SQL graph, machine learning, python, adaptive query processing, and much much more.
SQL Server 2017 Deep Dive - @Ignite 2017Travis Wright
This was a presentation given at Ignite 2017 on SQL Server 2017. It covers the main new capabilities of SQL Server 2017. The video recording of the session is available here: https://myignite.microsoft.com/sessions/54946?source=sessions
Microsoft Ignite 2017 - SQL Server on Kubernetes, Swarm, and Open ShiftTravis Wright
This document discusses containers and container technologies like Docker. It provides examples of real world uses of containers at Microsoft including for automated testing of SQL Server on Linux with hundreds of containers running tests simultaneously. It also covers container networking and how containers connect within and across hosts. Persistent storage options for containers using technologies like NFS, Ceph, Azure Blob storage are presented. Secret management in containers using an encrypted distributed store is also summarized.
SQL Server 2017 will be available on Linux, providing customers choice in platforms. It will include the database engine, integration services and support for technologies like in-memory processing and always encrypted. The same SQL Server licenses can be used on Windows or Linux, with previews available free of charge. Early adopters can test SQL Server 2017 on Linux through a special program and provide feedback to Microsoft.
SQL Server 2017 will bring SQL Server to Linux for the first time. This presentation covers the scope, schedule, and architecture as well as a background on why Microsoft is making SQL Server available on Linux.
SQL Server 2017 Overview and Partner OpportunitiesTravis Wright
SQL Server 2017 is going to be released later this year. In this session will cover what to expect and how partners can deliver additional value to SQL Server customers.
Vision presentation for the Data Amp event in Johannesburg, South Africa. Discusses Microsoft data platform strategy to be the most intelligent, trusted, and flexible data platform.
Data Amp South Africa - SQL Server 2017Travis Wright
Roadmap deck showing the newest capabilities of SQL Server 2017 including SQL Server on Linux, R/Python services, graph, adaptive query processing as well as new Azure services like Cosmos DB and Azure Database for PostgreSQL and MySQL.
This document provides an overview of SQL Server 2017 and how it can power an organization's entire data estate from on-premises to the cloud. It highlights key capabilities including business intelligence, advanced analytics, data management, security, flexibility and hybrid cloud capabilities with Microsoft Azure. Specific features are showcased such as in-memory technologies, graph support, mobile reporting, R and Python integration, and bringing these capabilities to any platform including Linux and containers. Performance and security benefits are emphasized along with case studies demonstrating the value SQL Server 2017 can provide organizations.
The SQL Server Engineering Team uses Kubernetes in Azure VMs for automated testing of SQL Server on Linux. They automate the build process to create container images and extended the test system to provision and execute tests targeting around 700 containers per run, usually daily, across 150 Azure VM hosts with 128GB of RAM and 8 cores each. The VMs can support 20+ SQL Server containers listening on different ports for high density testing.
SQL Server is container-ready. This deck covers some of the common ideas, misconceptions, myths, and realities of databases like SQL Server in a DevOps model.
SQL Server v.Next will be released for Linux in 2017. The summary provides an overview of the key points about SQL Server on Linux including:
- SQL Server will have the same functionality and capabilities on Linux as on Windows. It will support the same editions and features such as high availability, security, and programming features.
- The architecture involves a SQL Platform Abstraction Layer that maps Windows APIs to Linux system calls to provide a consistent programming model.
- An early adoption program is currently underway to get feedback from customers and partners on functionality and to help validate SQL Server on Linux prior to general availability in 2017.
SUSE Webinar - Introduction to SQL Server on LinuxTravis Wright
Introduction to SQL Server on Linux for SUSE customers. Talks about scope of the first release of SQL Server on Linux, schedule, Early Adoption Program. Recording is available here:
https://www.brighttalk.com/webcast/11477/243417
Nordic infrastructure Conference 2017 - SQL Server in DevOpsTravis Wright
SQL Server is coming to Linux in the next major version of SQL Server. Having SQL Server in Linux containers makes it much easier for dev/test, CI/CD, and build automation pipelines to be automated. This session describes some of the common challenges currently faced in trying to use SQL Server in Linux containers and how to overcome them. Integration with Red Hat Open Shift is also discussed.
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat
The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell
With expertise in data architecture, performance tracking, and revenue forecasting, Andrew Marnell plays a vital role in aligning business strategies with data insights. Andrew Marnell’s ability to lead cross-functional teams ensures businesses achieve sustainable growth and operational excellence.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
2. Deep Dive On SQL Server and
Big Data
Travis Wright, Mihaela Blendea, Umachandar Jayachandran
Program Managers, SQL Server
3. Build intelligent apps and
AI with all your data
Analyzing all data
Easily and securely manage
data big and small
Managing all data
Simplified management and analysis through a unified deployment, governance, and tooling
SQL Server enables
intelligence over all your data
Unified access to all your data with
unparalleled performance
Integrating all data
5. Easily deploy and manage a
SQL Server + Big Data cluster
Easily deploy and manage a Big Data cluster using Microsoft’s
Kubernetes-based Big Data solution built-in to SQL Server
Hadoop Distributed File System (HDFS) storage, SQL Server
relational engine, and Spark analytics are deployed as containers
on Kubernetes in one easy-to manage package
Benefits of containers and Kubernetes:
Fast to deploy
Self-contained – no installation required
Upgrades are easy because - just upload a new image
Scalable, multi-tenant, designed for elasticity
6. SQL Server Big Data Cluster Layout
Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
…
Compute pool
SQL Compute
Node
IoT data
Directly
read from
HDFS
Persistent storage
…
Storage pool
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
Kubernetes pod
Analytics
Custom
apps BI
SQL Server
master instance
Node Node Node Node Node Node Node
SQL
Data pool
SQL Data
Node
SQL Data
Node
Compute pool
SQL Compute
Node
Storage Storage
Controller
8. Base node configuration
Applies to nodes across all planes
Services
kubelet – K8s local agent
kube-proxy – network config and forwarding
supervisord – process monitor and control
fluentd – node logging
flanneld – Software defined network
collectd – OS and application data collection
SQL Big Data watchdog– config sync, watchdog, data
collector (DMV, etc) Kubernetes node
watchdog
kubelet
supervisord
fluentd
kube-proxy
flanneld
collectd
9. Control plane
External Endpoints
Kubernetes (REST)
Aris Control Service (REST)
Knox Gateway (REST gateway for Hadoop APIs)
SQL Server Master (TDS gateway for data marts and SQL
Master Service)
Services
etcd
Kubernetes Master Services
Controller
SQL Master instance
SQL Big Data Admin Portal
Knox Gateway
HDFS Name Service
YARN Master
Hive Metastore
InfluxDB (metrics store)
Livy (REST interface for Spark)
Spark Driver
Kubernetes node
Base node services + etcd
Controller
SQL Master
Proxy
HDFS Name Node
Kubernetes node
Base node services + etcd
Kubernetes Master Services
SQL Big Data Admin Portal
Spark Driver
InfluxDB
Kubernetes node
Base node services + etcd
Livy
Elastic Search
Knox
Hive Metastore
Grafana
Kibana
Yarn Master
10. Controller
External REST/HTTPS Endpoint
Bootstrap and Build out
Manage Capacity
Configure High Availability and recover from failure (AGs)
Security (authN, authZ, certificate rotation)
Lifecycle (upgrade/downgrade/rollback)
Configuration management
Monitoring - capacity, health, metrics, logs
Troubleshooting – performance, failures
Cluster Admin Portal
Controller Service
Buildout
Upgrade/Rollback
HADR
Add/Remove Capacity
Central AuthZ/AuthN
Cluster Admin Portal
Troubleshooting
11. SQL Master instance
TDS endpoint into the cluster
High value data
OLTP server
Data connectors
Machine learning & extensibility
Scalable query engine with readable secondary replicas
Built-in high availability with Always On Availability Groups
(coming soon)
Master Instance Availability Group
12. Compute plane
Hosts one or more SQL Compute Pools
Compute pool is a group of instances that forms a data,
security, and resource boundary.
Compute pool processes complex distributed queries against
the data plane.
Local storage is used for shuffling data if necessary.
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
Compute pool node
Base node services
SQL Engine
13. Data plane
Storage pool
Data ingestion through Spark (batch and streaming)
Data storage in HDFS
Data access through HDFS and SQL endpoints
SQL engine reads files in HDFS directly
Data pool
Partitioned, in-memory cache for external data or HDFS
Scale-out data storage for append only data sets
Data ingestion through Spark
Storage pool node
Base node services
SQL Engine
Data pool node
Base node services
SQL Engine
HDFS
Spark
Storage pool node
Base node services
SQL Engine
HDFS
Spark
18. SQL Server
T-SQLAnalytics Apps
ODBC NoSQL Relational databases Big Data
PolyBase external tables
SQL Server is the hub for integrating data
Easily combine across relational and non-relational data stores
20. Scale-out data pools combine and cache data from many
sources for fast querying
Scenario
A global car manufacturing company wants to join data
from across multiple sources including HDFS, SQL Server,
and Cosmos DB
Solution
• Query data in relational and non-relational data stores with
new PolyBase connectors
• Create a scale-out data pool cache of combined data
• Expose the datasets as a shared data source, without
writing code to move and integrate data
SQL Server
Scale-out data pool
HDFS Cosmos DB SQL Server
Polybase
connectors
Shard 1 Shard nShard 2IoT data
22. SQL Server can now read directly from HDFS files
Elastically scale compute and storage using HDFS-based
storage pools with SQL Server and Spark built in
Mount and manage remote stores through HDFS
Mount various on-prem and cloud data stores
Accelerate computation by caching data locally
Disaster recovery/Data backup
Storage pool
SQL Server Master instance/Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
SQL
Server
HDFS Data Node
Spark
Other HDFS store Remote cloud
store
27. Data scientists can use familiar tools to analyze
structured and unstructured data
1. Use SQL Ops Studio notebooks run a Spark job
over structured and unstructured data
2. Spark SQL jobs can access data in SQL Server
3. Queries can be pushed down to other data
sources like Oracle Database and Mongo DB
4. The Spark job returns the data to the notebook
SQL Server master instance
External data
sources
Storage pool
Spark Spark Spark
Azure Data
Studio
28. Model & serve
Business/custom apps
(Structured)
Logs, files and media
(unstructured)
Sensors and IoT
(unstructured)
Predictive
apps
BI tools
Store
HDFS
SQL Server data
pools
Ingest
Spark streaming
Prep & train
Spark
Spark ML
SQL Server
ML Services
SQL Server
master instance
Simplified management and analysis through a unified deployment, governance, and tooling
Integrate structured and unstructured data
SQL Server
master instance
REST API containers
for models
30. SQL Server master instance
Storage pool
Spark
MLeap Runtime
Spark Spark
Model
Scoring
Training Training Training
33. Managed SQL Server, Spark
and data lake
Store high volume data in a data lake and access
it easily using either SQL or Spark
Management services, admin portal, and
integrated security make it all easy to manage
SQL Server
Data virtualization
Combine data from many sources without
moving or replicating it
Scale out compute and caching to boost
performance
T-SQL
Analytics Apps
Open
database
connectivity
NoSQL Relational
databases
HDFS
Complete AI platform
Easily feed integrated data from many sources
to your model training
Ingest and prep data and then train, store, and
operationalize your models all in one system
SQL Server
External Tables
Compute pools and data pools
Spark
Scalable, shared storage (HDFS)
External
data sources
Admin portal and management services
Integrated AD-based security
SQL Server
ML Services
Spark & Spark
ML
HDFS
REST API
containers
for models
#5: Challenge: Managing relational and Big Data is complicated
Pillar: Simplify big data management with improved performance and security
#6: Kubernetes-based Big Data distribution is easy to deploy, upgrade and patch, and scales in seconds using compute pools
visually tie this container to the containers in the previous slide
zoom in to show you what's inside one of the containers
#17: Challenge: Data movement can be problematic
Pillar: Gain unified access to all your data using data virtualization
#19: Query across relational and non-relational data stores including Oracle, Teradata, Mongo etc
Besides supporting Hadoop, now PolyBase will let you query over RDBMS, NoSQL and generic ODBC sources.
#21: Increase performance for data virtualization using data pools in scale-out data pools
SCENARIO: Scale out SQL Server with Hadoop and Spark to unlock Big Data
Scenario: A car manufacturing company wants to join data from across multiple sources including Cloudera for sensor data, SQL Server that has customer data with PII (that they want to keep in SQL and mask), and Cosmos DB that has connected car data in Azure. Today the car manufacturer has a Cloudera cluster with 100 nodes on-prem and 1.5 Petabytes of data. They are running into issues with Hive performance for interactive queries.
Problem: To join this data now, they would have to move it into a single system, a huge undertaking.
Solution: Create a shared, scalable data lake based on HDFS. Expose these datasets as a shared semantic layer, so that it can be used to business analysts without moving the data.
In this scenario, you still have to refresh the data periodically (still lives in its home database – not in the data lake)
#23: Application developers can access relational and non-relational data using familiar T-SQL commands
#27: Challenge: Managing relational and Big Data is complicated
Pillar: Simplify big data management with improved performance and security
#28: Data scientists can easily analyze data through Spark jobs
#29: Easy tooling to ingest, store, prep & train, model and serve high velocity using unified data management
Spark streaming in the box
Data preparation tools
Integrated Jupyter notebooks
#30: Open source effort
A common serialization format and execution engine
Train your pipeline and score it on a JVM
Supports Spark ML, scikit-learn and Tensorflow
Core execution engine implemented in Scala
Choose from two portable serialization formats (JSON, Protobuf)
MLeap DataFrame
JSON/Avro/Binary
#31: Bring your own Java :
- Windows: 1.10 (1.8 support is coming)
- Linux 1.8 support
#36: Get started today with the SQL Server v.Next preview by going to https://aka.ms/eapsignup