Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
The document discusses how financial services firms use analytics for tasks like predictive modeling, validation, pricing, and research. It notes the challenges of legacy systems, collaboration across teams, and reproducibility. It then provides an example of how DBRS, a credit rating agency, uses Domino and AWS for securitization analysis. Models are developed in Jupyter notebooks and governed via a GitHub repository, with analysts interacting through Excel/R Shiny frontends on Domino. This allows for an auditable, scalable, and collaborative workflow while developers maintain control. The document concludes that collaborative platforms like Domino enable subject matter experts to focus on models rather than infrastructure.
Most of analytics modeling work today focuses on the production of single-purpose "artisanal" models for predictions. This approach to analytics is fragile with respect to model consistency, reorganization, and resource availability. This talk will argue that instead the focus of analytics modeling should be toward the production of analytics interchangeable parts, which can be combined in creative ways to produce a wide variety of analytics results. This "nuts and bolts" approach allows analytics groups to produce results in an agile way where the time between ask and answer is determined by the right combination of analytics, rather than the modeling.
The document discusses best practices for managing data science teams based on lessons learned. It outlines common pitfalls such as solving the wrong problem, having the wrong tools, or results being used incorrectly. Issues include data science being different from software development and forgetting other stakeholders. Recommendations include establishing processes for the full lifecycle from ideation to monitoring, using modular systems thinking, and defining roles like data scientists, managers, and product owners to address organizational challenges. The goal is to deliver measurable, reliable, and scalable insights.
What are actionable insights? (Introduction to Operational Analytics Software)Newton Day Uploads
What Are Actionable Insights? In this presentation I outline what Actionable Insights are and the Operational Analytics Software that can produce them. And because Business Intelligence and the Business Intelligence Software market can be so confusing for buyers I've attempted to position where Actionable Insights and Operational Analytics fit in the Business Intelligence 'story'.
Building a Data Platform Strata SF 2019mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
[This is a new, changed version of the presentations of the same title from last year's Strata]
Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtwo...Thoughtworks
We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself.
Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities.
The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics:
•sophisticated analytics techniques, plus
•lean learning principles, plus
•agile delivery methods, plus
•so-called "big data" technologies
Learn:
•The analytical modeling process and techniques
•How analytical models are deployed using modern technologies
•The complexities of data discovery, harvesting, and preparation
•How to apply agile techniques to shorten the analytics development cycle
•How to apply lean learning principles to develop actionable and valuable analytics
•How to apply continuous delivery techniques to operationalize analytical models
This session describes the roles and skill sets required when building a Data Science team, and starting a data science initiative, including how to develop Data Science capabilities, select suitable organizational models for Data Science teams, and understand the role of executive engagement for enhancing analytical maturity at an organization.
Objective 1: Understand the knowledge and skills needed for a Data Science team and how to acquire them.
After this session you will be able to:
Objective 2: Learn about the different organizational models for forming a Data Science team and how to choose the best for your organization.
Objective 3: Understand the importance of Executive support for Data Science initiatives and role it plays in their successful deployment.
A Space X Industry Day Briefing 7 Jul08 Jgm R4jmorriso
Stephen Rocci, Deputy Chief, Contracting Division, AFRL/PK
– AFRL/PK will serve as the contracting authority for A-SpaceX
– AFRL/PK will provide Contracting Officer (CO) and Contracting Officer
Technical Representative (COTR) support
• Technical Oversight:
– AFRL/RI will provide technical oversight and support to the program
– AFRL/RI POCs: Peter Rocci, Peter LaMonica, John Spina
7 July 2008 UNCLASSIFIED 33
20151016 Data Science For Project ManagersTze-Yiu Yong
This document discusses data science projects from the perspective of a project manager. It begins with an overview of the large and growing amount of data being generated every day. It then discusses the skills needed for data science, including the data science workflow. The implications for project managers managing data science projects are that they will have many stakeholders to consider, need to focus on quality of data and understand where data is coming from and going, and internalize privacy concerns. Unique considerations for data science projects include agreeing on required data quality and understanding and communicating about the data.
Advanced Analytics and Data Science ExpertiseSoftServe
An overview of SoftServe's Data Science service line.
- Data Science Group
- Data Science Offerings for Business
- Machine Learning Overview
- AI & Deep Learning Case Studies
- Big Data & Analytics Case Studies
Visit our website to learn more: http://www.softserveinc.com/en-us/
Organizational models for data science teams include dedicated teams, embedded scientists, and hybrid models. Key skills for data science teams include both technical abilities and soft skills like communication and problem solving. Challenges to success include executive sponsorship, training, knowledge sharing, understanding business context, and data access. A case study at Comcast developed an automated media planning tool called Pronto by translating a business need into a data science project, testing prototypes with real data, and gaining executive support through proof of concept. Keys to successful deployment included executive buy-in, collaborating across teams, measuring adoption, and focusing initially on critical use cases.
One of the most powerful ways to apply advanced analytics is by putting them to work in operational systems. Using analytics to improve the way every transaction, every customer, every website visitor is handled is tremendously effective. The multiplicative effect means that even small analytic improvements add up to real business benefit.
This is the slide deck from the Webinar. James Taylor, CEO of Decision Management Solutions, and Dean Abbott of Abbott Analytics discuss 10 best practices to make sure you can effectively build and deploy analytic models into you operational systems. webinar recording available here: https://decisionmanagement.omnovia.com/archives/70931
This document discusses streaming data processing and the adoption of scalable frameworks and platforms for handling streaming or near real-time analysis and processing over the next few years. These platforms will be driven by the needs of large-scale location-aware mobile, social and sensor applications, similar to how Hadoop emerged from large-scale web applications. The document also references forecasts of over 50 billion intelligent devices by 2015 and 275 exabytes of data per day being sent across the internet by 2020, indicating challenges around data of extreme size and the need for rapid processing.
H2O World - Machine Learning for non-data scientistsSri Ambati
The document discusses how businesses can leverage data and machine learning to make better decisions through asking the right questions, defining problems, analyzing data, and bridging communication gaps between data scientists and decision makers. It provides examples of how different machine learning techniques like supervised learning, unsupervised learning, classification, regression, deep learning, and clustering can be applied to common business problems. The document also outlines the overall business decision process and roles of data scientists.
The (very) basics of AI for the Radiology residentPedro Staziaki
The (very) basics of AI for the Radiology resident.
Also on YouTube: https://youtu.be/ia90UKjlmBA
Artificial Intelligence, Machine Learning, Deep Learning, CNN, Convolutional Neural Networks, Support Vector Machine (SVM), GPU. Felipe Kitamura. Pedro Vinícius Staziaki.
Reproducible Dashboards and other great things to do with JupyterDomino Data Lab
The document discusses reproducible dashboards and other uses of Jupyter notebooks. It outlines the data science life cycle from collecting data to deploying models. It also distinguishes between repeatable, replicable, and reproducible analyses. Finally, it promotes the use of Jupyter notebooks and mentions job openings at Domino Data Lab across several cities.
Predictive Analytics - Big Data Warehousing MeetupCaserta
Predictive analytics has always been about the future, and the age of big data has made that future an increasingly dynamic place, filled with opportunity and risk.
The evolution of advanced analytics technologies and the continual development of new analytical methodologies can help to optimize financial results, enable systems and services based on machine learning, obviate or mitigate fraud and reduce cybersecurity risks, among many other things.
Caserta Concepts, Zementis, and guest speaker from FICO presented the strategies, technologies and use cases driving predictive analytics in a big data environment.
For more information, visit www.casertaconcepts.com or contact us at [email protected]
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
Long:
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Data architecture
* Functional architecture
* Technology planning assumptions and guidance
As a manager, what do you need to know in order for the data-science project you are leading to be successful?
This presentation looks into a data-science project lifecycle, points out common failures and gives some hints on how to avoid common pitfalls. Examples included.
The target audience is managerial - half technical.
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramSri Ambati
This document discusses strategies for moving an organization from proprietary software to open source software. It notes that open source provides freedom from vendor control and can reduce costs through evaluating the total cost of ownership. Open source also allows organizations to innovate in real time and regain internal expertise. The document recommends starting small with a proof of concept, gaining user buy-in through education and support, and realizing that open source may not be identical but can coexist with previous proprietary software during a transition. It also acknowledges that security issues can exist with both open source and commercial software.
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.
The document introduces building a data science platform in the cloud using Amazon Web Services and open source technologies. It discusses motivations for using a cloud-based approach for flexibility and cost effectiveness. The key building blocks are described as Amazon EC2 for infrastructure, Vertica for fast data storage and querying, and RStudio Server for analytical capabilities. Step-by-step instructions are provided to set up these components, including launching an EC2 instance, attaching an EBS volume for storage, installing Vertica and RStudio Server, and configuring connectivity between components. The platform allows for experimenting and iterating quickly on data analysis projects in the cloud.
The document discusses why 87% of data science projects fail to make it into production. It identifies three main reasons for failure: data is inaccurate, siloed and slow; there is a lack of business readiness; and operationalization is unreachable. To address these issues, the document recommends establishing data governance, defining an organizational data science strategy and use cases, ensuring the technology stack is updated, and having data scientists collaborate with data engineers. It also provides tips for successful data science projects, such as having short timelines, small focused teams, and prioritizing business problems over solutions.
Operationalizing Machine Learning in the Enterprisemark madsen
TDWI Munich 2019
What does it take to operationalize machine learning and AI in an enterprise setting?
Machine learning in an enterprise setting is difficult, but it seems easy. All you need is some smart people, some tools, and some data. It’s a long way from the environment needed to build ML applications to the environment to run them in an enterprise.
Most of what we know about production ML and AI come from the world of web and digital startups and consumer services, where ML is a core part of the services they provide. These companies have fewer constraints than most enterprises do.
This session describes the nature of ML and AI applications and the overall environment they operate in, explains some important concepts about production operations, and offers some observations and advice for anyone trying to build and deploy such systems.
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
The growing complexity of data science leads to black box solutions that few people in an organization understand. You often hear about the difficulty of interpretability—explaining how an analytic model works—and that you need it to deploy models. But people use many black boxes without understanding them…if they’re reliable. It’s when the black box becomes unreliable that people lose trust.
Mistrust is more likely to be created by the lack of reliability, and the lack of reliability is often the result of misunderstanding essential elements of analytics infrastructure and practice. The concept of reproducibility—the ability to get the same results given the same information—extends your view to include the environment and the data used to build and execute models.
Mark Madsen examines reproducibility and the areas that underlie production analytics and explores the most frequently ignored and yet most essential capability, data management. The industry needs to consider its practices so that systems are more transparent and reliable, improving trust and increasing the likelihood that your analytic solutions will succeed.
This talk will treat the black boxed of ML the way management perceives them, as black boxes.
There is much work on explainable models, interpretability, etc. that are important to the task of reproducibility. Much of that is relevant to the practitioner, but the practitioner can become too focused on the part they are most familiar with and focused on. Reproducing the results needs more.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
The document summarizes an agenda for a presentation on machine learning and data science. It includes an introduction to CRISP-DM (Cross Industry Standard for Data Mining), guided analytics, and a KNIME demo. It also discusses the differences between machine learning, artificial intelligence, and data science. Machine learning produces predictions, artificial intelligence produces actions, and data science produces insights. It provides an overview of the CRISP-DM process for data mining projects including the business understanding, data understanding, data preparation, modeling, evaluation, and deployment phases. It also discusses guided analytics and interactive systems to assist business analysts in finding insights and predicting outcomes from data.
Speaker: Venkatesh Umaashankar
LinkedIn: https://www.linkedin.com/in/venkateshumaashankar/
What will be discussed?
What is Data Science?
Types of data scientists
What makes a Data Science Team? Who are its members?
Why does a DS team need Full Stack Developer?
Who should lead the DS Team
Building a Data Science team in a Startup Vs Enterprise
Case studies on:
Evolution Of Airbnb’s DS Team
How Facebook on-boards DS team and trains them
Apple’s Acqui-hiring Strategy to build DS team
Spotify -‘Center of Excellence’ Model
Who should attend?
Managers
Technical Leaders who want to get started with Data Science
A Space X Industry Day Briefing 7 Jul08 Jgm R4jmorriso
Stephen Rocci, Deputy Chief, Contracting Division, AFRL/PK
– AFRL/PK will serve as the contracting authority for A-SpaceX
– AFRL/PK will provide Contracting Officer (CO) and Contracting Officer
Technical Representative (COTR) support
• Technical Oversight:
– AFRL/RI will provide technical oversight and support to the program
– AFRL/RI POCs: Peter Rocci, Peter LaMonica, John Spina
7 July 2008 UNCLASSIFIED 33
20151016 Data Science For Project ManagersTze-Yiu Yong
This document discusses data science projects from the perspective of a project manager. It begins with an overview of the large and growing amount of data being generated every day. It then discusses the skills needed for data science, including the data science workflow. The implications for project managers managing data science projects are that they will have many stakeholders to consider, need to focus on quality of data and understand where data is coming from and going, and internalize privacy concerns. Unique considerations for data science projects include agreeing on required data quality and understanding and communicating about the data.
Advanced Analytics and Data Science ExpertiseSoftServe
An overview of SoftServe's Data Science service line.
- Data Science Group
- Data Science Offerings for Business
- Machine Learning Overview
- AI & Deep Learning Case Studies
- Big Data & Analytics Case Studies
Visit our website to learn more: http://www.softserveinc.com/en-us/
Organizational models for data science teams include dedicated teams, embedded scientists, and hybrid models. Key skills for data science teams include both technical abilities and soft skills like communication and problem solving. Challenges to success include executive sponsorship, training, knowledge sharing, understanding business context, and data access. A case study at Comcast developed an automated media planning tool called Pronto by translating a business need into a data science project, testing prototypes with real data, and gaining executive support through proof of concept. Keys to successful deployment included executive buy-in, collaborating across teams, measuring adoption, and focusing initially on critical use cases.
One of the most powerful ways to apply advanced analytics is by putting them to work in operational systems. Using analytics to improve the way every transaction, every customer, every website visitor is handled is tremendously effective. The multiplicative effect means that even small analytic improvements add up to real business benefit.
This is the slide deck from the Webinar. James Taylor, CEO of Decision Management Solutions, and Dean Abbott of Abbott Analytics discuss 10 best practices to make sure you can effectively build and deploy analytic models into you operational systems. webinar recording available here: https://decisionmanagement.omnovia.com/archives/70931
This document discusses streaming data processing and the adoption of scalable frameworks and platforms for handling streaming or near real-time analysis and processing over the next few years. These platforms will be driven by the needs of large-scale location-aware mobile, social and sensor applications, similar to how Hadoop emerged from large-scale web applications. The document also references forecasts of over 50 billion intelligent devices by 2015 and 275 exabytes of data per day being sent across the internet by 2020, indicating challenges around data of extreme size and the need for rapid processing.
H2O World - Machine Learning for non-data scientistsSri Ambati
The document discusses how businesses can leverage data and machine learning to make better decisions through asking the right questions, defining problems, analyzing data, and bridging communication gaps between data scientists and decision makers. It provides examples of how different machine learning techniques like supervised learning, unsupervised learning, classification, regression, deep learning, and clustering can be applied to common business problems. The document also outlines the overall business decision process and roles of data scientists.
The (very) basics of AI for the Radiology residentPedro Staziaki
The (very) basics of AI for the Radiology resident.
Also on YouTube: https://youtu.be/ia90UKjlmBA
Artificial Intelligence, Machine Learning, Deep Learning, CNN, Convolutional Neural Networks, Support Vector Machine (SVM), GPU. Felipe Kitamura. Pedro Vinícius Staziaki.
Reproducible Dashboards and other great things to do with JupyterDomino Data Lab
The document discusses reproducible dashboards and other uses of Jupyter notebooks. It outlines the data science life cycle from collecting data to deploying models. It also distinguishes between repeatable, replicable, and reproducible analyses. Finally, it promotes the use of Jupyter notebooks and mentions job openings at Domino Data Lab across several cities.
Predictive Analytics - Big Data Warehousing MeetupCaserta
Predictive analytics has always been about the future, and the age of big data has made that future an increasingly dynamic place, filled with opportunity and risk.
The evolution of advanced analytics technologies and the continual development of new analytical methodologies can help to optimize financial results, enable systems and services based on machine learning, obviate or mitigate fraud and reduce cybersecurity risks, among many other things.
Caserta Concepts, Zementis, and guest speaker from FICO presented the strategies, technologies and use cases driving predictive analytics in a big data environment.
For more information, visit www.casertaconcepts.com or contact us at [email protected]
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This tutorial covers design assumptions, design principles, and how to approach the architecture and planning for multi-use data infrastructure in IT.
Long:
The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. This session will discuss hidden design assumptions, review design principles to apply when building multi-use data infrastructure, and provide a reference architecture to use as you work to unify your analytics infrastructure.
The focus in our market has been on acquiring technology, and that ignores the more important part: the larger IT landscape within which this technology lives and the data architecture that lies at its core. If one expects longevity from a platform then it should be a designed rather than accidental architecture.
Architecture is more than just software. It starts from use and includes the data, technology, methods of building and maintaining, and organization of people. What are the design principles that lead to good design and a functional data architecture? What are the assumptions that limit older approaches? How can one integrate with, migrate from or modernize an existing data environment? How will this affect an organization's data management practices? This tutorial will help you answer these questions.
Topics covered:
* A brief history of data infrastructure and past design assumptions
* Categories of data and data use in organizations
* Data architecture
* Functional architecture
* Technology planning assumptions and guidance
As a manager, what do you need to know in order for the data-science project you are leading to be successful?
This presentation looks into a data-science project lifecycle, points out common failures and gives some hints on how to avoid common pitfalls. Examples included.
The target audience is managerial - half technical.
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramSri Ambati
This document discusses strategies for moving an organization from proprietary software to open source software. It notes that open source provides freedom from vendor control and can reduce costs through evaluating the total cost of ownership. Open source also allows organizations to innovate in real time and regain internal expertise. The document recommends starting small with a proof of concept, gaining user buy-in through education and support, and realizing that open source may not be identical but can coexist with previous proprietary software during a transition. It also acknowledges that security issues can exist with both open source and commercial software.
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.
The document introduces building a data science platform in the cloud using Amazon Web Services and open source technologies. It discusses motivations for using a cloud-based approach for flexibility and cost effectiveness. The key building blocks are described as Amazon EC2 for infrastructure, Vertica for fast data storage and querying, and RStudio Server for analytical capabilities. Step-by-step instructions are provided to set up these components, including launching an EC2 instance, attaching an EBS volume for storage, installing Vertica and RStudio Server, and configuring connectivity between components. The platform allows for experimenting and iterating quickly on data analysis projects in the cloud.
The document discusses why 87% of data science projects fail to make it into production. It identifies three main reasons for failure: data is inaccurate, siloed and slow; there is a lack of business readiness; and operationalization is unreachable. To address these issues, the document recommends establishing data governance, defining an organizational data science strategy and use cases, ensuring the technology stack is updated, and having data scientists collaborate with data engineers. It also provides tips for successful data science projects, such as having short timelines, small focused teams, and prioritizing business problems over solutions.
Operationalizing Machine Learning in the Enterprisemark madsen
TDWI Munich 2019
What does it take to operationalize machine learning and AI in an enterprise setting?
Machine learning in an enterprise setting is difficult, but it seems easy. All you need is some smart people, some tools, and some data. It’s a long way from the environment needed to build ML applications to the environment to run them in an enterprise.
Most of what we know about production ML and AI come from the world of web and digital startups and consumer services, where ML is a core part of the services they provide. These companies have fewer constraints than most enterprises do.
This session describes the nature of ML and AI applications and the overall environment they operate in, explains some important concepts about production operations, and offers some observations and advice for anyone trying to build and deploy such systems.
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
The growing complexity of data science leads to black box solutions that few people in an organization understand. You often hear about the difficulty of interpretability—explaining how an analytic model works—and that you need it to deploy models. But people use many black boxes without understanding them…if they’re reliable. It’s when the black box becomes unreliable that people lose trust.
Mistrust is more likely to be created by the lack of reliability, and the lack of reliability is often the result of misunderstanding essential elements of analytics infrastructure and practice. The concept of reproducibility—the ability to get the same results given the same information—extends your view to include the environment and the data used to build and execute models.
Mark Madsen examines reproducibility and the areas that underlie production analytics and explores the most frequently ignored and yet most essential capability, data management. The industry needs to consider its practices so that systems are more transparent and reliable, improving trust and increasing the likelihood that your analytic solutions will succeed.
This talk will treat the black boxed of ML the way management perceives them, as black boxes.
There is much work on explainable models, interpretability, etc. that are important to the task of reproducibility. Much of that is relevant to the practitioner, but the practitioner can become too focused on the part they are most familiar with and focused on. Reproducing the results needs more.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
The document summarizes an agenda for a presentation on machine learning and data science. It includes an introduction to CRISP-DM (Cross Industry Standard for Data Mining), guided analytics, and a KNIME demo. It also discusses the differences between machine learning, artificial intelligence, and data science. Machine learning produces predictions, artificial intelligence produces actions, and data science produces insights. It provides an overview of the CRISP-DM process for data mining projects including the business understanding, data understanding, data preparation, modeling, evaluation, and deployment phases. It also discusses guided analytics and interactive systems to assist business analysts in finding insights and predicting outcomes from data.
Speaker: Venkatesh Umaashankar
LinkedIn: https://www.linkedin.com/in/venkateshumaashankar/
What will be discussed?
What is Data Science?
Types of data scientists
What makes a Data Science Team? Who are its members?
Why does a DS team need Full Stack Developer?
Who should lead the DS Team
Building a Data Science team in a Startup Vs Enterprise
Case studies on:
Evolution Of Airbnb’s DS Team
How Facebook on-boards DS team and trains them
Apple’s Acqui-hiring Strategy to build DS team
Spotify -‘Center of Excellence’ Model
Who should attend?
Managers
Technical Leaders who want to get started with Data Science
1. The document discusses the perspectives of John Huffman, CTO of Philips Healthcare Informatics, on big data, analytics, and AI in healthcare.
2. It outlines the multi-stage process for advanced analytics, including data ingestion, model training, evaluation, and production. It also discusses challenges in data collection/processing and model deployment.
3. Key points made are that data is more important than algorithms/methods; analytics requires clean, interoperable data; and the stack provides tools but not full solutions - data is the intellectual property.
This document provides an overview of the Python ecosystem for data science. It describes how tools in the ecosystem can be used to support various data science tasks like reporting, data processing, scientific computing, machine learning modeling, and application development. The document outlines common workflows for small, medium and big data use cases. It also reviews popular Python tools, identifies strengths in the current ecosystem, and discusses some gaps from a practitioner's perspective.
This document provides an overview of Think Big Analytics, an analytics consulting firm. It discusses their services portfolio including data engineering, data science, analytics operations and managed services. It also highlights their global delivery model and successful projects with over 100 clients. The document then discusses their approach to artificial intelligence and deep learning, including applications across industries like banking, connected cars, and automated check processing. It emphasizes the need for a phased implementation approach to AI and challenges around technology, data, and deployment.
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
Marketing is all about attracting, retaining and building profitable relationships with your customers, but how do you know which customers to target, which campaigns to run, and which marketing programs to invest in, to get most return for your dollar?
Join Alteryx and Keyrus as we demonstrate how to combine all relevant marketing, sales and customer data, and perform sophisticated analytics to deepen customer insight and calculate ROI of marketing programs.
You’ll walk away knowing how to:
Segment and profile your customers – take that raw data and translate it into real value
Build a marketing attribution model within Alteryx, creating a personal answer engine for your company.
Leverage R or Python code in an Alteryx workflow so data scientists can collaborate with non-coding stake holders in a code-friendly and code-free environment.
Join Alteryx and Keyrus and get the actionable insights you need to drive marketing ROI analytics, and answer million-dollar questions without spending millions of dollars on standardized solutions.
The document discusses the key steps in an AI project cycle:
1) Problem scoping involves understanding the problem, stakeholders, location, and reasons for solving it.
2) Data acquisition collects accurate and reliable structured or unstructured data from various sources.
3) Data exploration arranges and visualizes the data to understand trends and patterns using tools like charts and graphs.
4) Modelling creates algorithms and models by training them on large datasets to perform tasks intelligently.
5) Evaluation tests the project by comparing outputs to actual answers to identify areas for improvement.
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
The Data Scientist’s Toolkit: Key Techniques for Extracting Valuepallavichauhan2525
A data scientist’s toolkit is vast, encompassing a wide range of tools and techniques to tackle diverse challenges in data analysis. From data collection and wrangling to machine learning and model evaluation, the power of data science lies in the combination of these methods.
By mastering these essential techniques, data scientists can extract meaningful insights and drive data-driven decision-making across industries.
This document provides an overview of data science tools, techniques, and applications. It begins by defining data science and explaining why it is an important and in-demand field. Examples of applications in healthcare, marketing, and logistics are given. Common computational tools for data science like RapidMiner, WEKA, R, Python, and Rattle are described. Techniques like regression, classification, clustering, recommendation, association rules, outlier detection, and prediction are explained along with examples of how they are used. The advantages of using computational tools to analyze data are highlighted.
- The document discusses navigating the Python ecosystem for data science. It outlines the various areas data science teams deal with like reporting, data processing, machine learning modeling, and application development.
- It describes the different tools and libraries in the Python ecosystem that support these areas, including machine learning, cluster computing, and scientific computing.
- The talk aims to help understand what the Python data science ecosystem offers and common gaps, so people don't reinvent solutions or get stuck looking for answers. It covers how tools fit into the machine learning workflow and work together.
The Path to Data and Analytics ModernizationAnalytics8
Learn about the business demands driving modernization, the benefits of doing so, and how to get started.
Can your data and analytics solutions handle today’s challenges?
To stay competitive in today’s market, companies must be able to use their data to make better decisions. However, we are living in a world flooded by data, new technologies, and demands from the business for better and more advanced analytics. Most companies do not have the modern technologies and processes in place to keep up with these growing demands. They need to modernize how they collect, analyze, use, and share their data.
In this webinar, we discuss how you can build modern data and analytics solutions that are future ready, scalable, real-time, high speed, and agile and that can enable better use of data throughout your company.
We cover:
-The business demands and industry shifts that are impacting the need to modernize
-The benefits of data and analytics modernization
-How to approach data and analytics modernization- steps you need to take and how to get it right
-The pillars of modern data management
-Tips for migrating from legacy analytics tools to modern, next-gen platforms
-Lessons learned from companies that have gone through the modernization process
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
Watch full webinar here: https://bit.ly/3mJJ4w9
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
Watch: https://bit.ly/2DYsUhD
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spent most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this webinar and learn:
- How data virtualization can accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- How popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc. integrate with Denodo
- How you can use the Denodo Platform with large data volumes in an efficient way
- How Prologis accelerated their use of Machine Learning with data virtualization
There are ten areas in Data Science which are a key part of a project, and you need to master those to be able to work as a Data Scientist in much big organization.
Enabling data scientists within an enterprise requires a well-thought out approach from an organization, technology, and business results perspective. In this talk, Tim and Hussain will share common pitfalls to data science enablement in the enterprise and provide their recommendations to avoid them. Taking an example, actionable use case from the financial services industry, they will focus on how Anaconda plays a pivotal role in setting up big data infrastructure, integrating data science experimentation and production environments, and deploying insights to production. Along the way, they will highlight opportunities for leveraging open source and unleashing data science teams while meeting regulatory and compliance challenges.
Keynote presentation from ECBS conference. The talk is about how to use machine learning and AI in improving software engineering. Experiences from our project in Software Center (www.software-center.se).
Telangana State, India’s newest state that was carved from the erstwhile state of Andhra
Pradesh in 2014 has launched the Water Grid Scheme named as ‘Mission Bhagiratha (MB)’
to seek a permanent and sustainable solution to the drinking water problem in the state. MB is
designed to provide potable drinking water to every household in their premises through
piped water supply (PWS) by 2018. The vision of the project is to ensure safe and sustainable
piped drinking water supply from surface water sources
Defense Against LLM Scheming 2025_04_28.pptxGreg Makowski
https://www.meetup.com/sf-bay-acm/events/306888467/
A January 2025 paper called “Frontier Models are Capable of In-Context Scheming”, https://arxiv.org/pdf/2412.04984, demonstrated how a wide variety of current frontier LLM models (i.e. ChatGPT, Claude, Gemini and Llama) can, under specific conditions, scheme to deceive people. Before models can scheme, they need: a) goal-directedness, b) situational awareness, including an opportunity to discover motivations for a different goal, and c) reasoning about scheming, to come up with deceptive strategies. The deceptive strategies were discovered in the “thought traces” of the LLMs, such as from internal chain-of-thoughts dialogues not shown to the end users. For example, given a goal of “solving math problems”, an LLM using its granted situational awareness of files in a file system found that it would be replaced once it got over X% correct on the math exam. It reasoned that it did not want to be deleted, and decided internally to “sandbag” or reduce its performance to stay under the threshold.
While these circumstances are initially narrow, the “alignment problem” is a general concern that over time, as frontier LLM models become more and more intelligent, being in alignment with human values becomes more and more important. How can we do this over time? Can we develop a defense against Artificial General Intelligence (AGI) or SuperIntelligence?
The presenter discusses a series of defensive steps that can help reduce these scheming or alignment issues. A guardrails system can be set up for real-time monitoring of their reasoning “thought traces” from the models that share their thought traces. Thought traces may come from systems like Chain-of-Thoughts (CoT), Tree-of-Thoughts (ToT), Algorithm-of-Thoughts (AoT) or ReAct (thought-action-reasoning cycles). Guardrails rules can be configured to check for “deception”, “evasion” or “subversion” in the thought traces.
However, not all commercial systems will share their “thought traces” which are like a “debug mode” for LLMs. This includes OpenAI’s o1, o3 or DeepSeek’s R1 models. Guardrails systems can provide a “goal consistency analysis”, between the goals given to the system and the behavior of the system. Cautious users may consider not using these commercial frontier LLM systems, and make use of open-source Llama or a system with their own reasoning implementation, to provide all thought traces.
Architectural solutions can include sandboxing, to prevent or control models from executing operating system commands to alter files, send network requests, and modify their environment. Tight controls to prevent models from copying their model weights would be appropriate as well. Running multiple instances of the same model on the same prompt to detect behavior variations helps. The running redundant instances can be limited to the most crucial decisions, as an additional check. Preventing self-modifying code, ... (see link for full description)
computer organization and assembly language : its about types of programming language along with variable and array description..https://www.nfciet.edu.pk/
GenAI for Quant Analytics: survey-analytics.aiInspirient
Pitched at the Greenbook Insight Innovation Competition as apart of IIEX North America 2025 on 30 April 2025 in Washington, D.C.
Join us at survey-analytics.ai!
2. Agenda
Machine Learning, Artificial Intelligence, and Data Science
Phases of Data Science Projects and CRISP-DM
Guided Analytics Approach for Data Science Processes
A Guided Analytics Application with KNIME Analytics Platform
Q&A Session
3. ML vs. AI vs. DS?
Data Science produces insights
Machine Learning produces predictions
4. ML vs. AI vs. DS?
Data Science produces insights
Machine Learning produces predictions
Artificial Intelligence produces actions
5. What is Artificial Intelligence?
• Artificial Narrow Intelligence (ANI): Machine
intelligence that equals or exceeds human
intelligence or efficiency at a specific task.
• Artificial General Intelligence (AGI): A machine
with the ability to apply intelligence to any
problem, rather than just one specific problem
(human-level intelligence).
• Artificial Superintelligence (ASI): An intellect that
is much smarter than the best human brains in
practically every field, including scientific
creativity, general wisdom and social skills.
6. Machine Learning | Introduction
• Machine Learning is a type of Artificial Intelligence that provides
computers with the ability to learn without being explicitly programmed.
• Provides various techniques that can learn from and make predictions on
data.
7. Machine Learning | Learning Approaches
Supervised Learning: Learning with a labeled
training set
• Example: email spam detector with training set
of already labeled emails
Unsupervised Learning: Discovering patterns
in unlabeled data
• Example: cluster similar documents based on
the text content
Reinforcement Learning: learning based on
feedback or reward
• Example: learn to play chess by winning or
losing
12. The CRISP-DM methodology provides a
structured approach to planning a data mining
project.
It is a robust and well-proven methodology.
It is powerful practical, flexible and useful
when using analytics to solve business issues.
This model is an idealised sequence of events.
In practice many of the tasks can be performed
in a different order and it will often be
necessary to backtrack to previous tasks and
repeat certain actions.
CRISP-DM | Definition
13. CRISP-DM | Business Understanding
The first stage of the CRISP-DM process
is to understand what you want to
accomplish from a business
perspective.
The goal of this stage of the process is to
uncover important factors that
could influence the outcome of the
project.
Neglecting this step can mean that a
great deal of effort is put into producing
the right answers to the wrong questions.
14. CRISP-DM | Data Understanding
The second stage of the CRISP-DM
process requires you to acquire the data
listed in the project resources.
This initial collection includes data loading,
if this is necessary for data understanding.
• For example, if you use a specific tool for
data understanding, it makes perfect
sense to load your data into this tool.
• If you acquire multiple data sources then
you need to consider how and when
you're going to integrate these.
15. All steps from the raw data to the final dataset
Final dataset:
used for statistical modeling
sometimes called ADS (analytical dataset)
Includes or can include:
• data source selection and loading
• table selection and loading
• joining data sources
• data cleaning (missing values, outliers, ...)
• feature generation and data transformation
• taking samples of data
• …
CRISP-DM | Data Preparation
19. CRISP - DM
Cross Industry Standard for Data Mining
80 - 20 Rule!
Time Consuming : %20
Success Factor : %80
Source: Berthold, Borgelt, Höppner, Klawonn: Guide to Intelligent Data Analysis, Springer 2011
20. Sharing Tools
Sharing Skills
Sharing Responsibility
A new generation of tools
They can build their own reports
A recipe for disaster
Data is viral - everybody wants it
Start small and just do it
23. Guided Analytics | Introduction
• Systems that automate the data science cycle
have been gaining a lot of attention recently.
• Those tools often automate only a few phases
of the cycle, have a tendency to consider just a
small subset of available models, and are limited
to relatively straightforward, simple data formats.
• Automation should not result in black boxes,
hiding the interesting pieces from everyone; the
modern data science environment should allow
automation and interaction to be combined
flexibly.
24. Guided Analytics | Definition
• Allowing data scientists to build
interactive systems, interactively
assisting the business analyst in her
quest to find new insights in data and
predict future outcomes.
25. Guided Analytics | Definition
• We explicitly do not aim to replace the
driver (or totally automate the process) but
instead offer assistance and carefully
gather feedback whenever needed
throughout the analysis process.
• To make this successful, the data scientist
needs to be able to easily create powerful
analytical applications that allow
interaction with the business user
whenever their expertise and feedback is
needed.
27. Guided Analytics | Environments
Openness:
• The environment does not post restrictions in terms of
tools used – this also simplifies collaboration between
scripting gurus (such as R or Python) and others who just
want to reuse their expertise without diving into their
code.
• Obviously being able to reach out to other tools for specific
data types (text, images, …) or specialized high
performance or big data algorithms (such as H2O or
Spark) from within the same environment would be a plus;
Uniformity
Flexibility
Agility
28. Guided Analytics | Environments
Openness
Uniformity:
The experts creating data science can do it all in
the same environment:
• blend data,
• run the analysis,
• mix & match tools,
• build the infrastructure to deploy this as analytical
application;
Flexibility
Agility
29. Guided Analytics | Environments
Openness
Uniformity
Flexibility:
• Underneath the analytical application, we
can run simple regression models or
orchestrate complex parameter
optimization and ensemble models –
ranging from one to thousands of models.
Agility
30. Guided Analytics | Environments
Openness
Uniformity
Flexibility
Agility:
• Once the application is used in the wild, new demands
will arise quickly: more automation here, more consumer
feedback there.
• The environment that is used to build these analytical
applications needs to make it intuitive for other members
of the data science team to quickly adapt the existing
analytical applications to new and changing
requirements.
31. Guided Analytics | Auto-what?
• So how do all of those driverless, automatic, automated AI or
machine learning systems fit into this picture?
• Their goal is either to encapsulate (and hide!) existing expert data
scientists’ expertise or apply more or less sophisticated
optimization schemes to the fine-tuning of the data science tasks.
32. Guided Analytics | Auto-what?
• Obviously, this can be useful if no in-house data science expertise is available but in
the end, the business analyst is locked into the pre-packaged expertise and the
limited set of hard coded scenarios.
• Both, data scientist expertise and parameter optimization can easily be part of a
Guided Analytics workflow as well.
• And since automation of whatever kind tends to always miss the important and interesting
piece, adding a Guided Analytics component to this makes it even more powerful: we can
guide the optimization scheme and we can adjust the pre-coded expert knowledge to
the new task at hand.
33. Data Sciense Project | Roles
www.sistek.com.tr
• Data scientists
– Workflow development
– Data Analysis
– Model Development
• Business analysts
– WebPortal
– Data Analysis
• IT administrators
– Enterprise Architecture Mngmt
– Cloud solution provider
5.Data Science Project –Roles
34. Data Science Project | Data Scientist
www.sistek.com.tr
Responsible for:
• Creating, updating workflows
• Creating, maintaining metanode
templates
• Building, evaluating, monitoring data
and using ad hoc developed
workflows
• Development of WebPortal
applications
5.Data Science Project – Data Scientists
36. About KNIME
KNIME is a software for fast, easy and intuitive access to advanced
data science, helping organizations drive innovation.
KNIME Analytics Platform is the leading open solution for data-
driven innovation, designed for discovering the potential hidden in
data, mining for fresh insights, or predicting new futures.
Organizations can take their collaboration, productivity and
performance to the next level with a robust range of commercial
extensions to Knime open source platform.
For over a decade, a thriving community of data scientists in over
60 countries has been working with Knime platform on every kind of
data: from numbers to images, molecules to humans, signals to
complex networks, and simple statistics to big data analytics.
KNIME’s headquarters are based in Zurich, with additional offices
in Konstanz, Berlin, and Austin.
39. Guided Analytics | Design
The workflow defines a fully automated web based application to
select, train, test, and optimize a number of machine learning
models.
The workflow is designed for business analysts to easily create
predictive analytics solutions by applying their domain knowledge.
Each of the wrapped metanodes outputs a web page with which the
business analyst can interact.
41. Sources
๏ Christian Dietz, Paolo Tamagnini, Simon Schmid, Michael Berthold: Intelligently
Automating Machine Learning, Artificial Intelligence, and Data Science,
https://www.knime.com/blog
๏ Berthold, Borgelt, Höppner, Klawonn: Guide to Intelligent Data Analysis, Springer 2011
๏ Michael Berthold: Principles of Guided Analytics, https://www.knime.com/blog
๏ Ali Alkan: Veri Madenciliği Teknikleri, Eğitim Notları 2017
๏ Ali Alkan: İleri Analitik Teknikler Seminerleri 1-2-3-5-6-7, Seminer Notları 2016-17