This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
Marc embraces database virtualization and containers to help Dave's development team overcome data issues slowing their work. Virtualizing the database and creating "data pods" allows self-service access and the ability to quickly provision testing environments. This enables the team to work more efficiently and meet sprint goals. DataOps is introduced to fully integrate data into DevOps practices, removing it as a bottleneck through tools that provide versioning, automation and developer-friendly interfaces.
How to use Azure Machine Learning service to manage the lifecycle of your models. Azure Machine Learning uses a Machine Learning Operations (MLOps) approach, which improves the quality and consistency of your machine learning solutions.
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
Achieving agility in data and analytics is hard. It’s no secret that most data organizations struggle to deliver the on-demand data products that their business customers demand. Recently, there has been much hype around new design patterns that promise to deliver this much sought-after agility.
In this webinar, Chris Bergh, CEO and Head Chef of DataKitchen will cut through the noise and describe several elegant and effective data architecture design patterns that deliver low errors, rapid development, and high levels of collaboration. He’ll cover:
• DataOps, Data Mesh, Functional Design, and Hub & Spoke design patterns;
• Where Data Fabric fits into your architecture;
• How different patterns can work together to maximize agility; and
• How a DataOps platform serves as the foundational superstructure for your agile architecture.
A Questionnaire for Identify Failures in Business Analysis Phase of ERP ProjectsVaruna Harshana
Identifying business needs and designing solutions is done through the processes of “Business Analysis”. Many solutions are developed to provide the needs of businesses which include the implementation of ERP systems. ERP systems mostly cuts across many business processes hence create many complexities while designing them. The probability rate for these complexities to turn into failures is high. The Business analyst is mostly responsible in handling these issues and reduces them as well. Business analysis includes many phases which can be shown as follows;
Enterprise/Company analysis
Requirements planning and management
Requirements elicitation
Requirements analysis and documentation
Requirements communication
Solution assessment and validation
The above mentioned phases need to be executed in order to do a proper business analysis. Many aspects need to be considered and standards need to be followed in doing this analysis, so as mentioned earlier there is a high probability for these phases not to function in the expected manner. So the identification of the potential process failures needs to be done.
This can be done by preparing a questionnaire which will monitor important elements of each of the above mentioned phases of business analysis. These questions will be addressing many aspects such as standards used, tools used, parties responsible, causes of actions, etc. In this manner this questionnaire could simply identify the failures that could occur while carrying out the Business analysis stage.
The questionnaire that we prepared will clearly indicate how effectively anyone could point out potential failures of “Business analysis” stage.
AIOps is becoming imperative to the management of today’s complex IT systems and their ability to support changing business conditions. This slide explains the role that AIOps can and will play in the enterprise of the future, how the scope of AIOps platforms will expand, and what new functionality may be deployed.
Watch the webinar here. https://www.moogsoft.com/resources/aiops/webinar/aiops-the-next-five-years
This document discusses MLOps and Kubeflow. It begins with an introduction to the speaker and defines MLOps as addressing the challenges of independently autoscaling machine learning pipeline stages, choosing different tools for each stage, and seamlessly deploying models across environments. It then introduces Kubeflow as an open source project that uses Kubernetes to minimize MLOps efforts by enabling composability, scalability, and portability of machine learning workloads. The document outlines key MLOps capabilities in Kubeflow like Jupyter notebooks, hyperparameter tuning with Katib, and model serving with KFServing and Seldon Core. It describes the typical machine learning process and how Kubeflow supports experimental and production phases.
Amidst an industry cloud of confusion about what “AIOps” is and what it can do, these slides--based on the webinar from EMA research--delineates a clear path to victory for business and IT stakeholders seeking to use machine learning to optimize the performance of critical business services.
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
Applying DevOps to Databricks can be a daunting task. In this talk this will be broken down into bite size chunks. Common DevOps subject areas will be covered, including CI/CD (Continuous Integration/Continuous Deployment), IAC (Infrastructure as Code) and Build Agents.
We will explore how to apply DevOps to Databricks (in Azure), primarily using Azure DevOps tooling. As a lot of Spark/Databricks users are Python users, will will focus on the Databricks Rest API (using Python) to perform our tasks.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
The document discusses moving from data science to MLOps. It defines MLOps as extending DevOps methodology to include machine learning, data science, and data engineering assets. Key concepts of MLOps include iterative development, automation, continuous integration and delivery, versioning, testing, reproducibility, monitoring, source control, and model/feature stores. MLOps helps address challenges of moving models to production like the deployment gap by establishing best practices and tools for testing, deploying, managing, and monitoring models.
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
This document summarizes a webinar on building machine learning platforms. It discusses how operating ML models is complex, requiring tasks like monitoring performance, handling data drift, and ensuring governance and security. It then outlines common components of ML platforms, including data management, model management, and code/deployment management. The webinar will demonstrate how different organizations handle these components and include demos from four companies. It will also cover Databricks' approach to providing an ML platform that integrates various tools and simplifies the full ML lifecycle from data preparation to deployment.
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Understanding DataOps and Its Impact on Application QualityDevOps.com
Modern day applications are data driven and data rich. The infrastructure your backends run on are a critical aspect of your environment, and require unique monitoring tools and techniques. In this webinar learn about what DataOps is, and how critical good data ops is to the integrity of your application. Intelligent APM for your data is critical to the success of modern applications. In this webinar you will learn:
The power of APM tailored for Data Operations
The importance of visibility into your data infrastructure
How AIOps makes data ops actionable
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
The document provides an overview of DataOps and continuous integration/continuous delivery (CI/CD) practices for data management. It discusses:
- DevOps principles like automation, collaboration and agility can be applied to data management through a DataOps approach.
- CI/CD practices allow for data products and analytics to be developed, tested and released continuously through an automated pipeline. This includes orchestration of the data pipeline, testing, and monitoring.
- Adopting a DataOps approach with CI/CD enables faster delivery of data and analytics, more efficient and compliant data pipelines, improved productivity, and better business outcomes through data-driven decisions.
1) Databricks provides a machine learning platform for MLOps that includes tools for data ingestion, model training, runtime environments, and monitoring.
2) It offers a collaborative data science workspace for data engineers, data scientists, and ML engineers to work together on projects using notebooks.
3) The platform provides end-to-end governance for machine learning including experiment tracking, reproducibility, and model governance.
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
Today, practically every firm uses big data to gain a competitive advantage in the market. With this in mind, freely available big data tools for analysis and processing are a cost-effective and beneficial choice for enterprises. Hadoop is the sector’s leading open-source initiative and big data tidal roller. Moreover, this is not the final chapter! Numerous other businesses pursue Hadoop’s free and open-source path.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
Applying DevOps to Databricks can be a daunting task. In this talk this will be broken down into bite size chunks. Common DevOps subject areas will be covered, including CI/CD (Continuous Integration/Continuous Deployment), IAC (Infrastructure as Code) and Build Agents.
We will explore how to apply DevOps to Databricks (in Azure), primarily using Azure DevOps tooling. As a lot of Spark/Databricks users are Python users, will will focus on the Databricks Rest API (using Python) to perform our tasks.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
The document discusses moving from data science to MLOps. It defines MLOps as extending DevOps methodology to include machine learning, data science, and data engineering assets. Key concepts of MLOps include iterative development, automation, continuous integration and delivery, versioning, testing, reproducibility, monitoring, source control, and model/feature stores. MLOps helps address challenges of moving models to production like the deployment gap by establishing best practices and tools for testing, deploying, managing, and monitoring models.
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
This document summarizes a webinar on building machine learning platforms. It discusses how operating ML models is complex, requiring tasks like monitoring performance, handling data drift, and ensuring governance and security. It then outlines common components of ML platforms, including data management, model management, and code/deployment management. The webinar will demonstrate how different organizations handle these components and include demos from four companies. It will also cover Databricks' approach to providing an ML platform that integrates various tools and simplifies the full ML lifecycle from data preparation to deployment.
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
Delta Lake, an open-source innovations which brings new capabilities for transactions, version control and indexing your data lakes. We uncover how Delta Lake benefits and why it matters to you. Through this session, we showcase some of its benefits and how they can improve your modern data engineering pipelines. Delta lake provides snapshot isolation which helps concurrent read/write operations and enables efficient insert, update, deletes, and rollback capabilities. It allows background file optimization through compaction and z-order partitioning achieving better performance improvements. In this presentation, we will learn the Delta Lake benefits and how it solves common data lake challenges, and most importantly new Delta Time Travel capability.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Understanding DataOps and Its Impact on Application QualityDevOps.com
Modern day applications are data driven and data rich. The infrastructure your backends run on are a critical aspect of your environment, and require unique monitoring tools and techniques. In this webinar learn about what DataOps is, and how critical good data ops is to the integrity of your application. Intelligent APM for your data is critical to the success of modern applications. In this webinar you will learn:
The power of APM tailored for Data Operations
The importance of visibility into your data infrastructure
How AIOps makes data ops actionable
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
The document provides an overview of DataOps and continuous integration/continuous delivery (CI/CD) practices for data management. It discusses:
- DevOps principles like automation, collaboration and agility can be applied to data management through a DataOps approach.
- CI/CD practices allow for data products and analytics to be developed, tested and released continuously through an automated pipeline. This includes orchestration of the data pipeline, testing, and monitoring.
- Adopting a DataOps approach with CI/CD enables faster delivery of data and analytics, more efficient and compliant data pipelines, improved productivity, and better business outcomes through data-driven decisions.
1) Databricks provides a machine learning platform for MLOps that includes tools for data ingestion, model training, runtime environments, and monitoring.
2) It offers a collaborative data science workspace for data engineers, data scientists, and ML engineers to work together on projects using notebooks.
3) The platform provides end-to-end governance for machine learning including experiment tracking, reproducibility, and model governance.
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
Today, practically every firm uses big data to gain a competitive advantage in the market. With this in mind, freely available big data tools for analysis and processing are a cost-effective and beneficial choice for enterprises. Hadoop is the sector’s leading open-source initiative and big data tidal roller. Moreover, this is not the final chapter! Numerous other businesses pursue Hadoop’s free and open-source path.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047
Description:
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Debraj GuhaThakurta
Presented at: Global Big AI Conference, Santa Clara, Jan 2018 Developing and deploying AI solutions on the cloud using Team Data Science Process (TDSP) and Azure Machine Learning (AML)
The document contains the resume of Naveen Reddy Tamma which summarizes his work experience and qualifications. He has over 7 years of experience working as an Associate at Cognizant Technology Solutions on various projects involving Informatica ETL development, data quality, and reporting. He holds a B.Tech in Computer Science and has experience with technologies like Informatica, Teradata, Oracle, and Cognos.
The document contains the resume of Naveen Reddy Tamma which summarizes his work experience and qualifications. He has over 7 years of experience working as an Associate at Cognizant Technology Solutions on various projects involving Informatica ETL development, data quality testing, and report generation. He holds a B.Tech in Computer Science and has experience working with technologies like Informatica, Teradata, Oracle, and Cognos.
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...DianaGray10
This session is focused on the art of application architecture, where we unravel the intricacies of creating a standard, yet dynamic application structure.
We'll explore:
Essential components of a typical application, emphasizing their roles and interactions.
Learn how to connect UiPath RPA Processes, UiPath Apps, and Data Service together to build a stronger app.
Gain insights into building more efficient, interconnected, and robust applications in the UiPath ecosystem.
Speaker:
David Kroll, Director, Product Marketing @Ashling Partners and UiPath MVP
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
Comparison of Data Preparation vs. Data Wrangling Programming Languages, Frameworks and Tools in Machine Learning / Deep Learning Projects.
A key task to create appropriate analytic models in machine learning or deep learning is the integration and preparation of data sets from various sources like files, databases, big data storages, sensors or social networks. This step can take up to 80% of the whole project.
This session compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing (like Talend, Pentaho), streaming analytics ingestion (like Apache Storm, Flink, Apex, TIBCO StreamBase, IBM Streams, Software AG Apama), and data wrangling (DataWrangler, Trifacta) within visual analytics. Various options and their trade-offs are shown in live demos using different advanced analytics technologies and open source frameworks such as R, Python, Apache Hadoop, Spark, KNIME or RapidMiner. The session also discusses how this is related to visual analytics tools (like TIBCO Spotfire), and best practices for how the data scientist and business user should work together to build good analytic models.
Key takeaways for the audience:
- Learn various options for preparing data sets to build analytic models
- Understand the pros and cons and the targeted persona for each option
- See different technologies and open source frameworks for data preparation
- Understand the relation to visual analytics and streaming analytics, and how these concepts are actually leveraged to build the analytic model after data preparation
Video Recording / Screencast of this Slide Deck: https://youtu.be/2MR5UynQocs
This document discusses how data science models have transitioned to the cloud to take advantage of greater computing resources. It notes that data science models are resource-intensive and traditionally required powerful local machines. The cloud allows data scientists to run models on cloud infrastructure for lower costs than high-end laptops and with access to many GPUs. Several major cloud platforms - Azure, AWS, and Google Cloud - are discussed and compared in terms of their machine learning offerings. The document also introduces Microsoft's Team Data Science Process, which aims to help data science teams collaborate more effectively on projects in the cloud.
Jeff has over 33 years of experience in IT consulting, product development, and system operations. He has expertise in big data technologies including Hadoop, Spark, and Hive. Most recently as a Big Data Architect, he helped customers optimize data warehouse workloads on Hadoop. He also led teams to design and build innovative tools for automating data warehouse migrations to Hadoop. Jeff has extensive experience developing, operating, and administering large-scale production environments and big data initiatives.
Shivaprasada Kodoth is seeking a position as an ETL Lead/Architect with experience in data warehousing and ETL. He has over 8 years of experience in data warehousing and Informatica design and development. He is proficient in technologies like Oracle, Teradata, SQL, and PL/SQL. Some of his key projects include developing ETL mappings and workflows for integrating various systems at BoheringerIngelheim and UBS. He is looking for opportunities in Bangalore, Mangalore, Cochin, Europe, USA, Australia, or Singapore.
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
This document discusses best practices for developing data science products at Philip Morris International (PMI). It covers:
- PMI's data science team of over 40 people across four hubs working on fraud prevention and other problems.
- Key principles for PMI's data science work, including being business-driven, investing in people, self-organizing, iterating to improve, and co-creating solutions.
- Challenges in data product development involving integrating work between data scientists and other teams, and practices like continuous integration/delivery to overcome these challenges.
- The role of data scientists in contributing code that is readable, testable, reusable, reproducible, and usable by other teams to integrate into
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
Getting machine learning models to production is notoriously difficult: it involves multiple teams (data scientists, data and machine learning engineers, operations, …), who often does not speak to each other very well; the model can be trained in one environment but then productionalized in completely different environment; it is not just about the code, but also about the data (features) and the model itself… At DataSentics, as a machine learning and cloud engineering studio, we see this struggle firsthand – on our internal projects and client’s projects as well.
This document provides an introduction to big data and analytics. It discusses definitions of key concepts like business intelligence, data analysis, and big data. It also provides a brief history of analytics, describing how technologies have evolved from early business intelligence systems to today's big data approaches. The document outlines some of the key components of Hadoop, including HDFS and MapReduce, and how it addresses issues like volume, variety and velocity of big data. It also discusses related technologies in the Hadoop ecosystem.
Data sataware Science byteahead and Machine web development company Learning app developers near me platforms hire flutter developer are the ios app devs groups a software developers of technologies software company near me that provide software developers near me users good coders with tools top web designers to create, sataware maintain, software developers az and monitor app development phoenix machine app developers near me learning idata scientists algorithms. top app development This software source bitz combines software company near smart, app development company near me decision-making, software developement near me problem-solving app developer new york algorithms software developer new york with data, app development new york thereby software developer los angeles permitting software company los angeles developers app development los angeles to build how to create an app a business how to creat an appz solution. ios app development company Some nearshore software development company software sataware platforms byteahead deliver web development company preview app developers near me algorithms hire flutter developer and ios app devs basic a software developers workflows software company near me with such software developers near me benefits good coders as drag-and-drop top web designers modeling sataware and software developers az graphic app development phoenix interfaces app developers near me that simply idata scientists connect top app development essential source bitz data software company near to the app development company near me end solution, software developement near me while app developer new york others software developer new york need app development new york a better software developer los angeles knowledge software company los angeles of development app development los angeles and coding how to create an app ability. how to creat an appz These ios app development company procedures nearshore software development company can include sataware functionality byteahead for image web development company processing, app developers near me natural hire flutter developer language ios app devs processing, a software developers speech software company near me recognition, software developers near me and recommendation good coders systems top web designers and also sataware included software developers az other app development phoenix machine app developers near me learning idata scientists abilities.
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesRaphael Branger
"We now do Agile BI too” is often heard in todays BI community. But can you really "create" agile in Business Intelligence projects? This presentation shows that Agile BI doesn't necessarily start with the introduction of an iterative project approach. An organisation is well advised to establish first the necessary foundations in regards to organisation, business and technology in order to become capable of an iterative, incremental project approach in the BI domain.
In this session you learn which building blocks you need to consider. In addition you will see what a meaningful sequence to these building blocks is. Selected aspects like test automation, BI specific design patterns as well as the Disciplined Agile Framework will be explained in more and practical details.
Coding software and tools used for data science management - PhdassistancephdAssistance1
The technique of extracting usable information from data is known as data science. This is the procedure for collecting, modelling and analysing, data in order to address real-world issues. Data Science tools have been developed as a result of the vast range of applications and rising demand. The following section goes through the greatest Data Science tools in detail.The most notable attribute of these tools is that they do not require the usage of programming languages to implement Data Science.
Read More: https://bit.ly/3rbp1Lb
For Enquiry:
India: +91 91769 66446
UK: +44 7537144372
Email: [email protected]
This document discusses DevOps and MLOps practices for machine learning models. It outlines that while ML development shares some similarities with traditional software development, such as using version control and CI/CD pipelines, there are also key differences related to data, tools, and people. Specifically, ML requires additional focus on exploratory data analysis, feature engineering, and specialized infrastructure for training and deploying models. The document provides an overview of how one company structures their ML team and processes.
This document discusses various tools and technologies used in data science. It covers popular programming languages like Python, R, Java and C++; databases like MySQL, NoSQL, SQL Server and Oracle; data analytics tools like SAS, Tableau, SPSS and Excel; APIs like TensorFlow; servers and frameworks like Hadoop and Spark; and compares SQL and NoSQL databases. It provides details on languages and tools like R, Python, Excel, SAS, SPSS and discusses their uses and popularity in data science.
Ansible, Terraform, CloudFormation, [insert your favorite tech here]… Les solutions d’infra-as-code sont pléthores. Alors, pourquoi parler du dernier rejeton à la mode porté par le CNCF ? Allez, spoilons un peu l'affaire ! Bâti sur Kubernetes, Crossplane permet lui de faire converger le delivery d’une app containerisée avec toutes les autres ressources requises hors de votre cluster K8S préféré, et dont elle aura toutefois grand besoin pour fonctionner correctement : un bucket S3, une base de donnée managée, etc.. Vous orchestrez ainsi le cycle de vie de votre application complète avec une seule et même perspective. Ajoutez à cela un multicloud facilité, ou encore une vrai capacité à s’inscrire dans une démarche GitOps, et vous obtenez là une solution très efficace pour organiser vos prochains déploiements !
This presentation explains what serverless is all about, explaining the context from Devs & Ops points of view, and presenting the various ways to achieve serverless (Functions a as Service, BaaS....). It also presents the various competitors on the market and demo one of them, openfaas. Finally, it enlarges the pictures, positionning serverless, combined with Edge computing & IoT, as a valuable triptic cloud vendors are leveraging on top of, to create end-to-end offers.
2 self-managed Docker clusters deployed on public clouds and fight each other in a ruthless battle. One has been designed to resist any form of threat. The other one's only aim is to destroy the first one. Who's going to win?
Although it's presented as an entertainment, this talk will show off two serious platforms leveraging on different principles. Beyond the technical aspects covered (swarm/kubernetes orchestration, IaaS clouds, various tools such as terraform, kops or helm) , it will be the opportunity to discuss more largely architecture topics such as immutable infrastructure, hybridation, microservices, etc.
DevOps at scale: what we did, what we learned at Societe GeneraleAdrien Blind
The following talk discusses Societe Generale's transformation journey to DevOps, and more largelly to continuous delivery principles, inside a large, traditionnal company. It emphases the importance of practices over tooling, a human centric approach massively leveraging on coaching, and our "framework" approach to make it scaling up to the IS level.
It has been initially delivered at DevOps Rex conference, with teammate Laurent Dussault, also DevOps coach at Societe Generale.
Unleash software architecture leveraging on dockerAdrien Blind
The following talk first comes back on key aspects of microservices architectures. It then shifts to Docker, to explain in this context the benefits of containers and especially the new orchestration features appeared with version 1.12.
Docker, cornerstone of cloud hybridation ? [Cloud Expo Europe 2016]Adrien Blind
The following talk discusses the opportunity to leverage on docker to create an hybrid logical cloud built simultaneously on top of traditionnal datacenters and public cloud vendors and enabling to manage new kind of containers (Windows, linux over ARM). It also discusses the value of such capacity for applications in a contexte of topology orchestrations and micro service oriented applications.
DevOps à l'échelle: ce que l'on a fait, ce que l'on a appris chez Societe Gen...Adrien Blind
The following talk discusses Societe Generale's transformation journey to DevOps, and more largelly to continuous delivery principles, inside a large, traditionnal company. It emphases the importance of practices over tooling, a human centric approach massively leveraging on coaching, and our "framework" approach to make it scaling up to the IS level.
It has been initially delivered at DevOps Rex conference, with teammate Laurent Dussault, also DevOps coach at Societe Generale.
Docker, cornerstone of an hybrid cloud?Adrien Blind
In this presentation, I propose to explore the orchestration & hybridation potential raised by Docker 1.12 Swarm Mode and the subsequent benefits.
I'll first remind why docker fits well the microservices paradigms, and how does this architecture engender new challenges : service discovery, app-centric security, scalability & resilience, and of course, orchestration.
I'll then discuss the opportunity to create your own docker CaaS platform hybridating simultaneously on various cloud vendors & traditional datacenters, better than just leveraging on vendors integrated offers.
Finally, I'll discuss the rise of new technologies (Windows containers, ARM architectures) in the docker landscape, and the opportunity of integrating them in a global docker composite orchestration, enabling to depict globally complex apps.
Petit déjeuner Octo - L'infra au service de ses projetsAdrien Blind
Cette présentation revient sur le projet d'automatisation de l'infrastructure informatique de Société Générale, dans un contexte plus large de déploiement des pratiques et outils du continuous delivery et devops.
Since many apps are not about just a single container, this talk discusses the ability and benefits of creating an hybrid Docker cluster capacity leveraging on Linux+Windows OS and x86+ARM architectures.
Moreover, the docker nodes composing this cloud will be hosted across several providers (local DC, cloud vendors such as Azure or AWS), in order to face various scenarios (cloud migration, elasticity...).
DevOps, NoOps, everything-as-code, commoditisation… Quel futur pour les ops ?Adrien Blind
La mise en oeuvre du continuous delivery engendre de nouvelles pressions sur les Ops, l’infra et l’opérabilité d’une application se bâtissant désormais au rythme croissant des itérations livrées. En parallèle, les patterns d’architecture évoluent eux aussi : résilience et scalabilité se traitent désormais de plus en plus au sein même des applications, ramenant progressivement l’infrastructure au rang de commodité… Enfin, les équipes de Devs n’ont de cesse de réclamer plus d’autonomie et une ergonomie plus adaptée à leurs besoins : les acteurs du cloud et de solutions star comme Docker ne s’y sont pas trompés en proposant des produits qui leur parlent directement : la tentation du NoOps grandit peu à peu…
L’enjeu pour les Ops consiste donc à proposer un positionnement et une offre en résonance avec ces nouvelles attentes. Les challenges sont nombreux, revêtant à la fois des aspects techniques (infra-as-code, software-defined-software/storage/, hybridation du SI…) et non techniques (agilité, craftsmanship, devops…).
Des Devs s’arrogeant la place des Ops, des Ops acquérant des compétence de Dev… Dans cette session, nous vous proposons ainsi d’explorer ces profondes mutations culturelles et techniques, et nous vous partagerons quelques recettes pour le plus grand bénéfice des OPs… comme des DEVs. Comme l’écrivait Audiard, « Quand ça change, ça change... Faut jamais se laisser démonter » !
Introduction to Unikernels at first Paris Unikernels meetupAdrien Blind
This is an introduction to unikernels and their impact on architecture and IT organizations (in French, I'll translate it in short terms). I produced this talk for the first Paris Unikernels Meetup.
When Docker Engine 1.12 features unleashes software architectureAdrien Blind
This slidedeck deals with new features delivered with Docker Engine 1.12, in a larger context of application architecture & security. It has been presented at Voxxed Days Luxembourg 2016
The document discusses full stack automation and DevOps. It introduces Clément Cunin and Adrien Blind and their roles. Some key benefits discussed are reduced time to market, repeatability, and serenity. Methods discussed include deploying new releases daily with a 15 minute commit to production time, treating infrastructure as code, using ephemeral environments, and measuring everything.
This presentation discusses how to achieve continuous delivery, leveraging on docker containers, here used as universal application artifacts. It has been presented at Voxxed '15 Bucharest.
Docker: Redistributing DevOps cards, on the way to PaaSAdrien Blind
This talk first presents Docker through its key characteristics: being Portable, Disposable, Live, Social. It then discusses a new type of cloud, the CaaS (Container as a Service), and it potential benefits for PaaS (Platform as a Service).
Docker, Pierre angulaire du continuous delivery ?Adrien Blind
This presentation explores continuous delivery principles leveraging on Docker : it depicts the use of Docker containers as universal application artifacts, delivered flowly all along a deployment pipeline.
This slideshow has been initially presented at Devops D-Day conference, Marseille.
Identity & Access Management in the cloudAdrien Blind
This presentation discusses the evolution of IAM (Identity & Access Management) problematic, considering a context pushing more & more externalization & opening (B2B, B2C) of enterprises IS, also leveraging massively on the cloud.
The talk particularly focuses on IAM SSO & federation topics, and subsequent technologies (SAML, OpenID, OAuth...).
The missing piece : when Docker networking and services finally unleashes so...Adrien Blind
Docker now provides several building blocks, combining engine, clustering, and componentization, while the new networking and service features enable many new usecases such as multi-tenancy. In this session, you will first discover the new experimental networking and service features expected soon, and then drift rapidly to software architecture, explaining how a complete Docker stack unleashes microservices paradigms.
The first part of the talk will introduce what SDNs and service registries are to the audience and will cover corresponding network & service experimental features of docker accordingly, with a technical focus. For instance, it explains how to create an overlay network of top of a swarm cluster or how to publish services.
The second part of the talk moves from infrastructure to application concerns, explaining that application architecture paradigms are shifting. In particular, we discuss the growing porosity of companies’s IS (especially due to massive use of cloud services) drifting security boundaries from the global IS perimeter, to the application shape. We also remind that traditional SOA patterns leveraging on buses (ie. ESBs & ETLs) are being replaced by microservices promoting more direct, full-mesh, interactions. To get the picture really complete, we’ll also rapidely remind other trends and shifts which are already covered by other docker components: scalability & resiliency to be supported by the apps themselves, fine-grained applications, or even infrastructure commoditization…
Most of all, the last part depicts a concrete, state-of-the-art application, applying all the properties discussed previously, and leveraging on a multi-tenant docker full stack using new networking and services features, in addition to traditional swarm, compose, and engine components. And just because we say it doesn’t mean it’s true, we’ll be happy to demonstrate this live !
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
Hands On: Create a Lightning Aura Component with force:RecordDataLynda Kane
Slide Deck from the 3/26/2020 virtual meeting of the Cleveland Developer Group presentation on creating a Lightning Aura Component using force:RecordData.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Leading AI Innovation As A Product Manager - Michael JidaelMichael Jidael
Unlike traditional product management, AI product leadership requires new mental models, collaborative approaches, and new measurement frameworks. This presentation breaks down how Product Managers can successfully lead AI Innovation in today's rapidly evolving technology landscape. Drawing from practical experience and industry best practices, I shared frameworks, approaches, and mindset shifts essential for product leaders navigating the unique challenges of AI product development.
In this deck, you'll discover:
- What AI leadership means for product managers
- The fundamental paradigm shift required for AI product development.
- A framework for identifying high-value AI opportunities for your products.
- How to transition from user stories to AI learning loops and hypothesis-driven development.
- The essential AI product management framework for defining, developing, and deploying intelligence.
- Technical and business metrics that matter in AI product development.
- Strategies for effective collaboration with data science and engineering teams.
- Framework for handling AI's probabilistic nature and setting stakeholder expectations.
- A real-world case study demonstrating these principles in action.
- Practical next steps to begin your AI product leadership journey.
This presentation is essential for Product Managers, aspiring PMs, product leaders, innovators, and anyone interested in understanding how to successfully build and manage AI-powered products from idea to impact. The key takeaway is that leading AI products is about creating capabilities (intelligence) that continuously improve and deliver increasing value over time.
Learn the Basics of Agile Development: Your Step-by-Step GuideMarcel David
New to Agile? This step-by-step guide is your perfect starting point. "Learn the Basics of Agile Development" simplifies complex concepts, providing you with a clear understanding of how Agile can improve software development and project management. Discover the benefits of iterative work, team collaboration, and flexible planning.
Rock, Paper, Scissors: An Apex Map Learning JourneyLynda Kane
Slide Deck from Presentations to WITDevs (April 2021) and Cleveland Developer Group (6/28/2023) on using Rock, Paper, Scissors to learn the Map construct in Salesforce Apex development.
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...Fwdays
Why the "more leads, more sales" approach is not a silver bullet for a company.
Common symptoms of an ineffective Client Partnership (CP).
Key reasons why CP fails.
Step-by-step roadmap for building this function (processes, roles, metrics).
Business outcomes of CP implementation based on examples of companies sized 50-500.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
Introdution to Dataops and AIOps (or MLOps)
1. An introduction to
DataOps & AIOps (or MLOps)
Adrien Blind (@adrienblind)
Disclaimer and credits:
Parts of this presentation have been built with former team mates out of the context of Saagie:
- a broader talk initially co-developed and co-delivered along with Frederic Petit for DevOps D-Day and Snow Camp conferences. Original slides here: https://bit.ly/2Ci3Ilh
- a talk discussing Continuous Delivery and DevOps, co-developed and co-delivered along with Laurent Dussault for DevOps Rex conferences. Slides here: https://bit.ly/2CmEIcB
5. The point is to Operationalize data projects
Proof of Concept
Operational product
● Robust, resilient
● Scalable
● Secure
● Updatable
● Shareable
6. Value is hard to demonstrate
Long time to implement
Rarely deployed in production
Only 27% of CxO considered their Big
Data projects valuable
12 to 18 months to build and deploy
AI pilots
Only 15% of AI projects have been
deployed
Sources
Gartner’s CIO Survey (2018)
The Big Data Payoff: Turning Big Data into Business Value (Cap Gemini and Informatica survey, 2016)
BCG, Putting Artificial Intelligence to Work, September 2017
Challenges delivering value from Big Data / AI
8. DIY, time/budget-consuming, multi-skills, high-risk approach
Grant access
Connect databases /
files
Integrate data
frameworks
Deploy test jobs &
validate models
Define new policies
Change algos and
integrate new libs
Rewrite/build ETL
codes to prod
Deploy prod jobs
Monitor & audit
activity
Write/Build ML
codes
Write/Build ETL
codes
Provision cluster(s)
Align processes w/
business reqs
Rewrite/build ML
codes to prod
Challenges ㅡ Process
SecurityIT Ops
Data Engineer IT Ops Data Scientist
Data Engineer Data Scientist
IT Ops
IT Ops
Data ScientistData Engineer
Data Steward Business Analyst
9. Barriers between organization : silos and different cultures!
Challenges ㅡ People & organization
Data Analyst
Data Steward
BUSINESS
Data Analyst
Data Steward
ANALYTICS
TEAM
Data Engineer
Data Scientists
IT
IT Ops
IT Architect & Coders
11. How DevOps solved it for app landscape?
Manual processing
Have a look on the complete DevOps introduction here: https://bit.ly/3gE5Hj4
12. Back on DevOps: “You build it, you run it”
Strong automation
Have a look on the complete Devops introduction here: https://bit.ly/3gE5Hj4
14. Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
Infrastructure landscape: infrastructure driven
15. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
Continuous
improvement
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.
Application landscape: API driven
Information Technology (on premises, cloud, etc.)
16. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
Internal raw data generated by
your apps
Continuous
improvement
Continuous
improvement
Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
External
API you
consume
Data Information System is data processing centric. Input is data, output is data and data models.
Generally not directly plugged on the operational IS (you copy data and process there)
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.
Data processing landscape: data driven
17. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
(For
analytics)
As shared datamarts &
more & more as APIs
(Provide
training sets
for AI)
Internal raw data generated by
your apps
Datasets
Continuous
improvement
Continuous
improvement
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
Data you share
externally
Data you share
back to operational IS
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.Data Information System is data processing centric. Input is data, output is data and data models.
Data processing landscape outputs
18. #3 AIOPs
Explore & build models
Data scientists need pipelines to
deliver valuable models
#1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
Continuous
improvement
(For
analytics)
Performance drift
analysis (to retrain &
optimize models)
As shared datamarts &
more & more as APIs
(Provide
training sets)
Internal raw data generated by
your apps
Models
to be bundled
and ran as
APIs in the
operational IS
Datasets
Continuous
improvement
Continuous
improvement
Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
Data you share
externally
Data you share
back to operational IS
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.Data Information System is data processing centric. Input is data, output is data and data models.
Data science landscape: model driven
19. AIOPs needs DataOps
In the data landscape, spotlights are on data analytics,
and even more on data science/AI which valorize data in a revolutionary way… because they solve business challenges.
… But it requires to have built up a data capital to process first!
Said differently, I like to say that…
( of AI ) ( DATA )
20. Summary: Pensé par les Devs… Pansé par les Ops!
Tech side Non-tech side
#0 ITOps
ITOps operationalizes the delivery of infrastructure assets.
The purpose is to deliver an underlying platform on top of
which assets will be hosted (apps/data processing/ML).
CloudOps lands here, but is opinionated on the way to
achieve this.
Fosters collaboration between Infrastructure teams working
in project mode to deliver new assets, and those running
them (support/run/monitoring, etc.).
#1 DevOps
DevOps operationalizes the delivery of app code (automates,
measure, etc.). The purpose is to deliver innovative
services to the business.
Fosters collaboration between devs who build apps, and ops
responsible to deploying & running these apps. “You build
it, you run it!”
#2 DataOps
DataOps operationalizes the setup of of data (automates
data processing). The purpose is to deliver/shape a capital
(of data).
Fosters collaboration between data engineers who own and
shape the data, and ops deploying the underlying data
processing jobs.
#3 AIOPs
AIOPs operationalizes the delivery of models. The purpose is
to deliver value.
Fosters collaboration between datascientists who explore
data to build up models, and ops delivering these as
useable asset.
Designed by devs, bandaged by the Ops (less fun in english)
So, what about BizDevOps, ITSecOps, DevFinOps, etc.? Business, Security, Finance, etc. are transversal interlocutors / topics which are to be addressed anyway, whatever we’re speaking about DevOps,
DataOps or AIOPs.
22. Agile & DevOps are not enough for data projects
Agile+Devops was good for app-centric projects, where data was isolated. But data-centric projects triggers new additional
challenges!
● New players to involve: data scientists, data engineers... These may have a completely different background
(mathematicians...) and face the technology differently. → Need common understanding, appropriate ergonomy.
(notebooks, GUI…)
● A recurrent technological/language stack used for the various types of jobs to handle: ingestion, dataprep, modeling… →
Need for a ready-to-use toolbox
● Coordinate the various jobs applied to the data → Need for job pipelining/orchestration
● Feed the dev process massively using production data (ex. for machine learning) → Strengthen security
● Identify the patrimony (cataloging), share data, control spreading → Need for governance
23. One DataOps definition
DataOps is a collaborative data management
practice focused on improving the communication,
integration and automation of data flows between
data managers and data consumers across an
organization.
The goal of DataOps is to deliver value faster by
creating predictable delivery and change
management of data, data models and related
artifacts.
DataOps uses technology to automate the design,
deployment and management of data delivery with
the appropriate levels of governance and metadata
to improve the use and value of data in a dynamic
environment.
Source: Gartner - Innovation Insight for DataOps - Dec. 2018
24. DataOps is gaining momentum
The number of data and analytics
experts in business units will grow
at 3X the rate of experts in IT
departments, which will force
companies to rethink their
organizational models and skill sets.
80% of organizations will initiate
deliberate competency development
in the field of data literacy,
acknowledging their extreme
deficiency.
26. Data engineers need pipelines to deliver data
Extract Transform Agregate Share
Shared
Dataset(s)
& data APIs
Data processing
Consumers
That’s where your good old
datawarehouse
generally stands!
If data is the new oil, datalakes are just oil fields (passive, mass raw of structured & unstructured data),
Hive/Impala & co. are oil rigs, while the DataOps pipelines are refineries, aimed at processing data…
Car engines are the datascience leveraging on this fuel to provide a disruptive way of transportation!
#1 the datalake is not the point (while companies focused on it). Data processing is.
#2 You don’t process data just for the pleasure. You do it to support activities which, them, bring value to the business.
DATALAKE
Data storing: datalakes, object storage, data virtualization
27. In comparision, Dev needed pipelines to deliver innovative apps
Commit
Compile
& test
Package
Deploy to
Dev &
test
Code
Running
app
Promote
to … &
test
Promote
to PROD
29. ShareTransformExtract
Inception: DataOps (and AIOps) delivered in a DevOps way
CONSUMEAggregate
Data processing jobs (for ingesting, transforming data, etc.) are finally just pieces of code.
These pieces of code can be delivered themselves using DevOps principles :) Automated through delivery pipelines.
30. DataOps Orchestrator
Enables the delivery and run of
data projects
DataLab Teams
Data projects governance
Software factory
Inception: DataOps (& AIOPs) to be achieved... in a DevOps way!
Regular landscape for apps (app servers…)
UAT PRODPREPRODDEV
Feature
team x
Feature
team y
Version nVersion n+1Version n+3 Version n+2
Version nVersion n+1Version n+3 Version n+2
Business
needs
API
API
31. Building up a dataops platform
Concretely, you need a platform performing the following features:
- It must enable to deploy data processing jobs, leveraging on languages/stacks and technologies that are
commonly used by data engineers (Apache Sqoop, python, java…). Regular ETLs may be part of the story
- It must enable to schedule and run pipelines aggregating jobs in logical sequences (acquiring data, preparing it,
delivering it in datamarts (databases, indexing clusters…)
- It must provide data cataloging & governance features (to have a clear view of the data patrimony), and enable to
manage data governance/security (perform access control, etc.)
- It must appropriate types of datamarts regarding the data patrimony (structured/non structured, time oriented or
not, etc.)
- It must have an ergonomy enabling data engineers and dataops persons to be autonomous and productive (avoid
using tools not design for them, such as regular “OPs” schedulers, raw use of complex tools such as
kubernetes…)
Progressively, more event-driven, data streaming projects arrive on the market. They also need appropriate set of
underlying technologies (Kafka clusters among them)
33. Datahub commitments: build up a data capital
Data Dictionnary &
catalog
Data Extraction /
Lineage
Expertize animation,
marketing,
communication
Data Exposition Data Processing
Data WareHouse /
Data Lake
Data Viz
Data Quality
Governance /
Security
Modelization
Transversal commitment: Build up & share a transverse data capital for the company
The process is largely geared by DataOps pipelines!
This is an extract from a longer presentation: extensive version can be found here https://bit.ly/33tfoNJ
34. Datahub commitments: deliver usecases
Data Collection
Data Exploration &
Analysis tools
ML Code
ML Trainning
(Model)
Monitoring
Data Viz
Data Verification
Service
Presentation
Deliver valuable usecases for the business
The process is largelly geared by a combination of Devops + Dataops + ML/AIOps pipelines!
This is an extract from a longer presentation: extensive version can be found here https://bit.ly/33tfoNJ
36. From DevOps to DataOps & AIOPs
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
Chapter
datascience
Chapter data
engineer
False good idea
Sounds logical, prolongating agile/devops paradigms. But it’s too early! You don’t have the
maturity & critical mass to do this at the begining!
37. From DevOps to DataOps & AIOPs: short term
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
SquadSquad
Chapter
datascience
Chapter data
engineer
DataHub
Valuableusecasesforthebusiness
Transversa
lactivities
Build a datahub first, which create a clear positionning, creates visibility accross the org.
Two objectives: deliver valuable usecases to ignite & show off value of data, while data used for it are the first data to integrate you data catalog
38. Data scientists chapters
(per tribe & datahub)
linked through a guild
From DevOps to DataOps & AIOPs: longer term
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
Squad
Data engineers
chapters (per
tribe & datahub)
linked through a
guild
DataHub
People working on business usescases will progressively get back to the regular organization: if you don’t your just creating a new silo, while the devops/agile
orgz were intended to remove them (paradox). As it was usefull in a first step, it should progressively spread in the org. You may only keep few squads to work on
very innovative tech to address new usescases (ex. deep learning when regular ML will become common). They will also be responsible to foster their expertize
through the guild they will animate too. However, you keep people working on transversal data engineering topics)
Valuableusecasesforthebusiness
Transversa
lactivities
39. Matrix organization & serendipity
This matrix organization (transversal datasets owned by the Datahub, securely shared to several isolated
usecases) enable to factorize the work (so raise your dataset ROI). Each time a usecase team needs a new
dataset, it should be capitalized by integratin the data catalog owned by the datahub (see the central team’s
value ?)
Serendipity: by having a clear understanding of your data patrimony, you can valorize it of course, but it may
also help to give new ideas! “Since I’ve this data, and this one, so I may be able to [your_new_idea_here]”
“If only HP knew what HP knows, we'd be three times more productive”
- Lew Platt, former CEO of Hewlett-Packard
Dataset #1 Dataset #2 Dataset #3 Dataset #4
Usecase #1
Usecase #2
Usecase #3
Data Catalog
41. Data engineering vs Data Science
[80%]
of a data project is roughly about
data aquisition/preparation/sharing
(data engineering)
[20%]
of a data project is roughly about
data valorization
(data science, data analytics)
→ Your datascientists generally spend most of their time at doing data engineering empirically
when a clear data engineer position doesn’t exist in your organization!
- It’s not very efficient (as datascientists costs much more than data engineers and are difficult to hire)
- They generally doesn’t like this activity (and may leave your company at the end!)
- Happens regularly: two datascientists using same data for different usecases will probably create 2 identical
ingestion/preparation pipelines for their projects (you miss a factorization effect)
42. Create clear Data Engineer and DataOps positions!
Data Engineers are the tech plumber of data
Key missions
- Create, configure transformation/preparation jobs to ingest and shape the data
- Deliver them through appropriate datamarts (DB, indexing clusters, APIs…)
- In small / fewly constrained setups, he may handle deployment/run of these process himself
in PROD (quite “noOps” pattern), or this is offloaded to a specialized dataops person
mutualized among several data engineers
Background
- More close to a developer / integrator than a datascientist! (but with a sensibilisation on data
challenges and technologies : Sqoop, HDFS, Hive, Impala, spark, Object storage, etc.)
Data analysts & scientists are experts in valorizing the data
Key missions
- Develop BI, analytics, models based on the datasets they have.
Background
- May come from a very non-IT background (former statisticians are commons) Knowledgeable
on specific frameworks (tensorflow, etc.)
The Data stewart is a functional manager of data
Key missions
- Manage governance and security
Background
- Have a functional / business knowledge of data
DataOps guy are the local, specialized OPs
attached to the data engineers & scientists
Key missions
- Offload deployment of jobs, pipelines and various assets built up by the data engineer (and
datascientists) from dev to prod
- Set up CI/CD toolchains and teach data engineers to work “in a devops way”
- Instrument/Monitor data flow and data quality, manage the run time
- ...
Background
- Mostly DevOps person, with sensibilization on data challenges, and technologies
Transversal,
support data
functions
44. How to start?
Focus on early usecases delivery to gain trust: datascientists and
analysts should be your best friends
● Define clear Data Engineer or even DataOps positions
● Provide them an industrial platform, enabling them to be more
autonomous and productive (less round trips with ops)
● Empower pluridisciplinary data project teams and make them
achieve some first (simple!) use cases to create confidence and
gain more budget if needed
● Set up empirically a basic data catalog made of the dataset
gathered and prepared for your usecases
Don’t enforce organization changes yet! Foster day to day collaboration on operational
topics first. Adopting technologies and automation is at the heart of any tech people (IT
dept. at the first row). This is a quite natural process. But changing organization is much
more sensitive (address management reorganization, people objectives changes, etc.).
This should be done in a latter step, when some early victories have helped to gain trust,
and proves your path is the right one.
45. How to start?
Now, it’s time to shape your datahub
● On the tech side: Automate the whole toolchain (CI/CD); shift to
more (complex) use cases (AI…), scale out platform
● Start changing organization / management: set up your datahub
with a clear commitment, spend more energy on the dataops part,
since enough usecases have been delivered to justify the
factorization/transversal effect
On a longer term, scuttle your work!
● More seriously, your initial siloted approach enabled to have the
critical mass to bootstrap. Now, it’s time to desilot your datalab to
spread in the whole IT dept; if you don’t, you just created a sub data
driven IT, in the larger IT ecosystem, with few porosity
46. BEWARE
Data engineering is a hidden (‘cause spotlights are on
datascientists) key success factor to accelerate,
increase reliability and enhance ROI of your data
project.
But don’t “do Dataops for Dataops”!
Remind : DataOps is there to serve, offload pains of
datascientists & analysts, which them transform
business needs in solution. Exactly like ITOps is there to
provide infrastructure assets to any app / data teams of
the IT dept...
47. WeWork
92 Av. des Champs-Élysées
75008 Paris - France
Seine Innopolis
72, rue de la République
76140 Le Petit-Quevilly - France
Thank you!
@adrienblind