New AI-based analytics accelerate truth-finding missions along the typical dimensions: Who, When, Where, Why, What, How and How Much.
In this very practical webinar, Johannes Scholtes (ZyLAB) and Paul Starrett (licensed attorney and private investigator with extensive experience in high-profile investigations), will talk with Mary Mack (ACEDS) and illustrate how these techniques help legal professionals to speed up the eDiscovery process and improve the quality.
Introduction to Data Science (Data Summit, 2017)Caserta
This document summarizes an introduction to data science presentation by Joe Caserta and Bill Walrond of Caserta Concepts. Caserta Concepts is an internationally recognized data innovation and engineering consulting firm. The agenda covers why data science is important, challenges of working with big data, governing big data, the data pyramid, what data scientists do, standards for data science, and a demonstration of data analysis. Popular machine learning algorithms like regression, decision trees, k-means clustering and collaborative filtering are also discussed.
This document discusses the importance of data fluency skills in the 21st century. It defines key terms like data science, machine learning, data literacy, and statistical literacy. While these fields require extensive training, the document argues that domain expertise combined with basic data analysis skills can solve many problems. These basic skills include understanding data structures, using programming to interact with data, and exploratory data analysis through visualization. The data analysis process involves defining problems, collecting and preparing data, visualization and modeling, and communicating results. RStudio is presented as a tool that can support the entire data analysis process within a single integrated development environment.
Closing the data source discovery gap and accelerating data discovery comprises three steps: profile, identify, and unify. This white paper discusses how the Attivio
platform executes those steps, the pain points each one addresses, and the value Attivio provides to advanced analytics and business intelligence (BI) initiatives.
Smart Data Webinar: A Roadmap for Deploying Modern AI in BusinessDATAVERSITY
Adopting elements of modern AI and cognitive computing - including advanced natural language processing, natural interface technologies such as gesture and emotion-recognition, and machine learning - is rapidly becoming a necessity for new applications. As people in all industries are exposed to better, more personalized and responsive experiences with software, they will begin to demand more from every system they use. For product strategists and developers, the issue is not whether to consider modern AI, the issue is how to do so most effectively.
Webinar participants will learn:
•How to classify and map application attributes to AI technologies and tools; including data attributes, end-user attributes, and context attributes such as weather and location
•How to prioritize applications in an existing portfolio for AI-enhancements, and
•How to assess organizational readiness for leveraging AI
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Text pre-processing of multilingual for sentiment analysis based on social ne...IJECEIAES
Sentiment analysis (SA) is an enduring area for research especially in the field of text analysis. Text pre-processing is an important aspect to perform SA accurately. This paper presents a text processing model for SA, using natural language processing techniques for twitter data. The basic phases for machine learning are text collection, text cleaning, pre-processing, feature extractions in a text and then categorize the data according to the SA techniques. Keeping the focus on twitter data, the data is extracted in domain specific manner. In data cleaning phase, noisy data, missing data, punctuation, tags and emoticons have been considered. For pre-processing, tokenization is performed which is followed by stop word removal (SWR). The proposed article provides an insight of the techniques, that are used for text pre-processing, the impact of their presence on the dataset. The accuracy of classification techniques has been improved after applying text preprocessing and dimensionality has been reduced. The proposed corpus can be utilized in the area of market analysis, customer behaviour, polling analysis, and brand monitoring. The text pre-processing process can serve as the baseline to apply predictive analysis, machine learning and deep learning algorithms which can be extended according to problem definition.
Data science vs. Data scientist by Jothi PeriasamyPeter Kua
This document discusses data science vs data scientists and outlines key competencies for data scientists. It defines data science as modernizing existing analytics and data solutions using new data sources, formats, architectures, and techniques. The document compares traditional and modern approaches to data and analytics. It also discusses the skills required of entry-level vs senior data scientists, noting that enterprise data scientists require strong industry and business process skills while focusing on data, analytics, communication and technical abilities. The document provides an overview of the roles, responsibilities and deliverables of data scientists on enterprise projects.
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Mindshappiestmindstech
The big impact of Big Data in the post-modern world is
unquestionable, un-ignorable and unstoppable today.
While there are certain discussions around Big Data being
really big, here to stay or just an over hyped fad; there are
facts as shared in the following sections of this whitepaper
that validate one thing - there is no knowing of the limits
and dimensions that data in the digital world can assume.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides an overview of the data science process and tools for a data science project. It discusses identifying important business questions to answer with data, extracting relevant data from sources, cleaning and sampling the data, analyzing samples to create models and check hypotheses, applying results to full data sets, visualizing findings, automating and deploying solutions, and continuously learning and improving through an iterative process. Key tools mentioned include Hadoop, R, Python, Excel, and various data wrangling, analysis, and visualization tools.
The document is a summary of the 2015 Data Science Salary Survey conducted by O'Reilly Media. Over 600 respondents from various industries completed the anonymous online survey about their demographics, tasks, tools used, and compensation. Key findings include that the top four tools - SQL, Excel, R, and Python - remained the same as the previous year. Use of Spark and Scala has grown significantly compared to last year, and their users tend to earn more. Even when controlling for other factors, women are paid less than men. About 40% of the variation in salaries can be explained by the data provided in the survey.
“Semantic Technologies for Smart Services” diannepatricia
Rudi Studer, Full Professor in Applied Informatics at the Karlsruhe Institute of Technology (KIT), Institute AIFB, presentation “Semantic Technologies for Smart Services” as part of the Cognitive Systems Institute Speaker Series, December 15, 2016.
Using technology intelligence tools, companies can cut the time spent on research and development from weeks or months to seconds or minutes. Technology intelligence refers to identifying technological opportunities and threats that could impact a company's future growth. These tools provide contextual access to relevant information and insights by combining web content, scientific journals, and patents with search technology and analysis. For example, a company could search for ways to reduce energy consumption and the tool would return a summary of solutions from various categories, such as approaches from the EPA and Department of Energy, in under a minute. This represents a shift from traditional research methods to quickly gaining actionable intelligence through intuitive searches.
KM - Cognitive Computing overview by Ken Martin 13Apr2016HCL Technologies
This document provides an introduction to cognitive computing and how it relates to knowledge management strategies. It begins with an overview of Ken Martin's background and the agenda. It then defines key cognitive computing concepts and technologies like natural language processing, machine learning, and pattern recognition. The document contrasts traditional and cognitive systems, noting cognitive systems are interactive, self-learning, and expand conversations. It maps cognitive capabilities to the KM lifecycle, showing how capabilities like natural language processing, text mining, and social network analysis can enhance each stage.
This document discusses Oracle's approach to big data and information architecture. It begins by explaining what makes big data different from traditional data, noting that big data refers to large datasets that are challenging to store, search, share, visualize, and analyze due to their volume, velocity, and variety. It then provides an overview of big data architecture capabilities and describes how to integrate big data capabilities into an organization's overall information architecture. The document concludes by outlining some key big data architecture considerations and best practices.
This presentation is prepared by one of our renowned tutor "Suraj"
If you are interested to learn more about Big Data, Hadoop, data Science then join our free Introduction class on 14 Jan at 11 AM GMT. To register your interest email us at [email protected]
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
Goal: explain the nature of the work of an analytics team to a manager, and enable people on those teams to explain what a data science team needs to a manager.
It seems as if every organization wants to enable analytical-decision making and embed analytics into operational processes. What can you do with analytics? It looks like anything is possible. What can you really do? Probably a lot less than you expect. Why is this? Vendors promise easy-to-use analytics tools and services but they rarely deliver. The products may be easy but the work is still hard.
Using analytics to solve problems depends on many factors beyond the math: people, processes, the skills of the analyst, the technology used, the data. Technology is the easy part. Figuring out what to do and how to do it is a lot harder. Despite this, fancy new tools get all the attention and budget.
People and data are the truly hard parts. People, because many believe that data is absolute rather than relative, and that analytic models produce an answer rather than a range of answers with varying degrees of truth, accuracy and applicability. Data, because managing data for analytics is a nuanced, detail-oriented and seemingly dull task left to back-office IT.
If your goal is to build a repeatable analytics capability rather than a one-off analytics project then you will need to address the parts that are rarely mentioned. This talk will explain some of the unseen and little-discussed aspects involved when building and deploying analytics.
The document outlines the syllabus for a course on data mining and data warehousing from Maulana Abul Kalam Azad University of Technology, West Bengal. It covers 7 units that discuss topics like introduction to data mining, data warehousing concepts, data mining techniques like decision trees and neural networks, mining association rules using various algorithms, clustering techniques, classification techniques, and applications of data mining. It also provides details on some core concepts like the stages of the knowledge discovery process, data mining functionalities, and classification of data mining systems.
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
Google “citizen data scientist” today and you will see about 1M results. That number is data. It may be interesting, but it is meaningless without context. Sometimes it appears that we are drowning in data from systems and sensors but starving for insights. We definitely produce more of the former than the latter, which has created demand for more powerful tools to simplify the process and lower the skills requirement for analysis. As vendors build systems to meet this demand, we hear about the coming ”democratization” of big data as more people at varying levels within organizations are empowered to find meaning and improve their own performance with data-driven insights. This is a good thing, but it does require caution.
To paraphrase Col Jessup in A Few Good Men: You want answers? You can’t handle the data.
In this webinar, we will survey emerging approaches to simplifying analysis, and discuss the benefits, dangers, and skills required for individuals and organizations to thrive in the brave new world of analytics everywhere, for everyone.
How the Analytics Translator can make your organisation more AI drivenSteven Nooijen
The document discusses how the Analytics Translator role can help organizations become more AI-driven by bridging the gap between business and technology. The Analytics Translator collects and prioritizes ideas, develops business cases for AI solutions, guides the solution development process, and drives adoption. Characteristics of a good Analytics Translator include understanding both business and AI, taking ownership, and operating at the intersection of UX, technology, and business. Developing this role is important for companies to successfully create impact and value from data and AI.
A SMART Seminar conducted on 3 May 2013 by Ian Bertram.
Leveraging information for decision making, assessing its value and ensuring frictionless sharing of information within the enterprise and beyond is what will fuel success in the current and future economy. New use cases with insatiable demand for real-time access to socially mediated and context-aware insights make information management in the 21st century dramatically different.
For more information, see http://goo.gl/a6F2c
iTrain Malaysia: Data Science by Tarun SukhaniiTrain
The document provides an overview of data science and opportunities in the field. It discusses what data science and big data are, key components of data science like the "4 V's" of big data, what a data scientist's skills and roles are. It also covers demand and opportunities in data science, giving examples of applications in different industries. It proposes an education framework for learning skills like coding, mathematics, statistics, machine learning and software engineering needed for a career in data science.
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Ai and Legal Industy - Executive OverviewGraeme Wood
Artificial intelligence and semantic computing technologies can help address challenges facing the legal industry. AI can perform tasks like legal research and contract analytics to help lawyers. It works by analyzing both structured and unstructured data using natural language processing. Semantic computing finds relevant information by understanding relationships between concepts. The legal industry should develop a data strategy, capture different types of data, hire skilled talent, and implement analytic tools to start leveraging AI. This can help automate some legal work and make professionals more efficient.
This document provides an overview of data mining. It introduces data mining and its goals, which include prediction, identification, classification, and optimization. The typical architecture of a data mining system is explained, including its major components. Common data mining techniques like classification, clustering, and association are also outlined. Examples are provided to illustrate techniques. The document concludes by discussing advantages and uses of data mining along with some popular data mining tools.
1. Data mining involves extracting useful patterns and knowledge from large amounts of data. It can help uncover hidden patterns and relationships to help organizations make better decisions.
2. The document discusses various data mining techniques like classification, clustering, association rule mining and describes how each technique can be applied.
3. It also covers important aspects of data mining like the steps in the knowledge discovery process, different types of databases, visualization techniques, and major issues in data mining.
Data science vs. Data scientist by Jothi PeriasamyPeter Kua
This document discusses data science vs data scientists and outlines key competencies for data scientists. It defines data science as modernizing existing analytics and data solutions using new data sources, formats, architectures, and techniques. The document compares traditional and modern approaches to data and analytics. It also discusses the skills required of entry-level vs senior data scientists, noting that enterprise data scientists require strong industry and business process skills while focusing on data, analytics, communication and technical abilities. The document provides an overview of the roles, responsibilities and deliverables of data scientists on enterprise projects.
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Mindshappiestmindstech
The big impact of Big Data in the post-modern world is
unquestionable, un-ignorable and unstoppable today.
While there are certain discussions around Big Data being
really big, here to stay or just an over hyped fad; there are
facts as shared in the following sections of this whitepaper
that validate one thing - there is no knowing of the limits
and dimensions that data in the digital world can assume.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides an overview of the data science process and tools for a data science project. It discusses identifying important business questions to answer with data, extracting relevant data from sources, cleaning and sampling the data, analyzing samples to create models and check hypotheses, applying results to full data sets, visualizing findings, automating and deploying solutions, and continuously learning and improving through an iterative process. Key tools mentioned include Hadoop, R, Python, Excel, and various data wrangling, analysis, and visualization tools.
The document is a summary of the 2015 Data Science Salary Survey conducted by O'Reilly Media. Over 600 respondents from various industries completed the anonymous online survey about their demographics, tasks, tools used, and compensation. Key findings include that the top four tools - SQL, Excel, R, and Python - remained the same as the previous year. Use of Spark and Scala has grown significantly compared to last year, and their users tend to earn more. Even when controlling for other factors, women are paid less than men. About 40% of the variation in salaries can be explained by the data provided in the survey.
“Semantic Technologies for Smart Services” diannepatricia
Rudi Studer, Full Professor in Applied Informatics at the Karlsruhe Institute of Technology (KIT), Institute AIFB, presentation “Semantic Technologies for Smart Services” as part of the Cognitive Systems Institute Speaker Series, December 15, 2016.
Using technology intelligence tools, companies can cut the time spent on research and development from weeks or months to seconds or minutes. Technology intelligence refers to identifying technological opportunities and threats that could impact a company's future growth. These tools provide contextual access to relevant information and insights by combining web content, scientific journals, and patents with search technology and analysis. For example, a company could search for ways to reduce energy consumption and the tool would return a summary of solutions from various categories, such as approaches from the EPA and Department of Energy, in under a minute. This represents a shift from traditional research methods to quickly gaining actionable intelligence through intuitive searches.
KM - Cognitive Computing overview by Ken Martin 13Apr2016HCL Technologies
This document provides an introduction to cognitive computing and how it relates to knowledge management strategies. It begins with an overview of Ken Martin's background and the agenda. It then defines key cognitive computing concepts and technologies like natural language processing, machine learning, and pattern recognition. The document contrasts traditional and cognitive systems, noting cognitive systems are interactive, self-learning, and expand conversations. It maps cognitive capabilities to the KM lifecycle, showing how capabilities like natural language processing, text mining, and social network analysis can enhance each stage.
This document discusses Oracle's approach to big data and information architecture. It begins by explaining what makes big data different from traditional data, noting that big data refers to large datasets that are challenging to store, search, share, visualize, and analyze due to their volume, velocity, and variety. It then provides an overview of big data architecture capabilities and describes how to integrate big data capabilities into an organization's overall information architecture. The document concludes by outlining some key big data architecture considerations and best practices.
This presentation is prepared by one of our renowned tutor "Suraj"
If you are interested to learn more about Big Data, Hadoop, data Science then join our free Introduction class on 14 Jan at 11 AM GMT. To register your interest email us at [email protected]
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
Goal: explain the nature of the work of an analytics team to a manager, and enable people on those teams to explain what a data science team needs to a manager.
It seems as if every organization wants to enable analytical-decision making and embed analytics into operational processes. What can you do with analytics? It looks like anything is possible. What can you really do? Probably a lot less than you expect. Why is this? Vendors promise easy-to-use analytics tools and services but they rarely deliver. The products may be easy but the work is still hard.
Using analytics to solve problems depends on many factors beyond the math: people, processes, the skills of the analyst, the technology used, the data. Technology is the easy part. Figuring out what to do and how to do it is a lot harder. Despite this, fancy new tools get all the attention and budget.
People and data are the truly hard parts. People, because many believe that data is absolute rather than relative, and that analytic models produce an answer rather than a range of answers with varying degrees of truth, accuracy and applicability. Data, because managing data for analytics is a nuanced, detail-oriented and seemingly dull task left to back-office IT.
If your goal is to build a repeatable analytics capability rather than a one-off analytics project then you will need to address the parts that are rarely mentioned. This talk will explain some of the unseen and little-discussed aspects involved when building and deploying analytics.
The document outlines the syllabus for a course on data mining and data warehousing from Maulana Abul Kalam Azad University of Technology, West Bengal. It covers 7 units that discuss topics like introduction to data mining, data warehousing concepts, data mining techniques like decision trees and neural networks, mining association rules using various algorithms, clustering techniques, classification techniques, and applications of data mining. It also provides details on some core concepts like the stages of the knowledge discovery process, data mining functionalities, and classification of data mining systems.
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
Google “citizen data scientist” today and you will see about 1M results. That number is data. It may be interesting, but it is meaningless without context. Sometimes it appears that we are drowning in data from systems and sensors but starving for insights. We definitely produce more of the former than the latter, which has created demand for more powerful tools to simplify the process and lower the skills requirement for analysis. As vendors build systems to meet this demand, we hear about the coming ”democratization” of big data as more people at varying levels within organizations are empowered to find meaning and improve their own performance with data-driven insights. This is a good thing, but it does require caution.
To paraphrase Col Jessup in A Few Good Men: You want answers? You can’t handle the data.
In this webinar, we will survey emerging approaches to simplifying analysis, and discuss the benefits, dangers, and skills required for individuals and organizations to thrive in the brave new world of analytics everywhere, for everyone.
How the Analytics Translator can make your organisation more AI drivenSteven Nooijen
The document discusses how the Analytics Translator role can help organizations become more AI-driven by bridging the gap between business and technology. The Analytics Translator collects and prioritizes ideas, develops business cases for AI solutions, guides the solution development process, and drives adoption. Characteristics of a good Analytics Translator include understanding both business and AI, taking ownership, and operating at the intersection of UX, technology, and business. Developing this role is important for companies to successfully create impact and value from data and AI.
A SMART Seminar conducted on 3 May 2013 by Ian Bertram.
Leveraging information for decision making, assessing its value and ensuring frictionless sharing of information within the enterprise and beyond is what will fuel success in the current and future economy. New use cases with insatiable demand for real-time access to socially mediated and context-aware insights make information management in the 21st century dramatically different.
For more information, see http://goo.gl/a6F2c
iTrain Malaysia: Data Science by Tarun SukhaniiTrain
The document provides an overview of data science and opportunities in the field. It discusses what data science and big data are, key components of data science like the "4 V's" of big data, what a data scientist's skills and roles are. It also covers demand and opportunities in data science, giving examples of applications in different industries. It proposes an education framework for learning skills like coding, mathematics, statistics, machine learning and software engineering needed for a career in data science.
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Ai and Legal Industy - Executive OverviewGraeme Wood
Artificial intelligence and semantic computing technologies can help address challenges facing the legal industry. AI can perform tasks like legal research and contract analytics to help lawyers. It works by analyzing both structured and unstructured data using natural language processing. Semantic computing finds relevant information by understanding relationships between concepts. The legal industry should develop a data strategy, capture different types of data, hire skilled talent, and implement analytic tools to start leveraging AI. This can help automate some legal work and make professionals more efficient.
This document provides an overview of data mining. It introduces data mining and its goals, which include prediction, identification, classification, and optimization. The typical architecture of a data mining system is explained, including its major components. Common data mining techniques like classification, clustering, and association are also outlined. Examples are provided to illustrate techniques. The document concludes by discussing advantages and uses of data mining along with some popular data mining tools.
1. Data mining involves extracting useful patterns and knowledge from large amounts of data. It can help uncover hidden patterns and relationships to help organizations make better decisions.
2. The document discusses various data mining techniques like classification, clustering, association rule mining and describes how each technique can be applied.
3. It also covers important aspects of data mining like the steps in the knowledge discovery process, different types of databases, visualization techniques, and major issues in data mining.
Bio IT World 2019 - AI For Healthcare - Simon Taylor, LucidworksLucidworks
1) An AI system implemented at Johns Hopkins Hospital helped optimize hospital operations and bed assignment. It allowed beds to be assigned 30% faster.
2) This reduced the need to keep surgery patients in recovery rooms longer than necessary by 80% and cut wait times for ER patients to receive beds by 20%.
3) The efficiencies also allowed the hospital to accept 60% more transfer patients from other hospitals.
The document summarizes an informational webinar about efficiently handling subject access requests through automation. It discusses the growing volume and complexity of subject access requests, as well as the challenges of the manual process. The webinar promotes automating tasks like deduplication, redaction, classification, searching, and producing documents through a software solution from ZyLAB that can help organizations scale to meet demand while reducing costs and risks. Automation through ZyLAB's eDiscovery platform is presented as helping make the subject access request process more efficient.
This talk is an introduction to Data Science. It explains Data Science from two perspectives - as a profession and as a descipline. While covering the benefits of Data Science for business, It explaints how to get started for embracing data science in business.
Evidence Data Preprocessing for Forensic and Legal AnalyticsCSCJournals
The document discusses best practices for preprocessing evidentiary data from legal cases or forensic investigations for use in analytical experiments. It outlines key steps like identifying the analytical aim or problem based on the case scope or investigation protocol, understanding the case data through assessment and exploration of its format, features, quality, and potential issues. Challenges of working with common text-based case data like emails, social media posts are also discussed. The goal is to clean and transform raw data into a suitable format for machine learning or other advanced analytical techniques while maintaining integrity and relevance to the case.
Demystifying analytics in e discovery white paper 06-30-14Steven Toole
The document discusses analytics technologies used in eDiscovery and information governance. It describes how analytics can help reduce document review costs by identifying relevant documents through techniques like clustering, conceptual search, and auto-categorization. Applying analytics to proactively organize a company's electronic records before litigation arises helps keep costs low and investigations more efficient. The key benefit of analytics is reducing the number of non-relevant documents reviewers need to examine, thereby saving time and money in the discovery process.
Understanding Data Science: Unveiling the Basics
What is Data Science?
Data science is an interdisciplinary field that combines techniques from statistics, mathematics, computer science, and domain knowledge to extract insights and knowledge from data. It involves collecting, processing, analyzing, and interpreting large and complex datasets to solve real-world problems.
Importance of Data Science
In today's data-driven world, organizations are inundated with data from various sources. Data science allows them to convert this raw data into actionable insights, enabling informed decision-making, improved efficiency, and innovation.
Intersection of Data Science, Statistics, and Computer Science
Data science borrows heavily from statistics and computer science. Statistical methods help in understanding data patterns, while computer science provides the tools to process and analyze large datasets efficiently.
Key Components of Data Science
Data Collection and Storage
The first step in data science is gathering relevant data from various sources. This data is then stored in databases or data warehouses for further processing.
Data Cleaning and Preprocessing
Raw data is often messy and inconsistent. Data cleaning involves removing errors, duplicates, and irrelevant information. Preprocessing includes transforming data into a usable format.
Exploratory Data Analysis (EDA)
EDA involves visualizing and summarizing data to uncover patterns, trends, and anomalies. It helps in forming hypotheses and guiding further analysis.
Machine Learning and Predictive Modeling
Machine learning algorithms are used to build predictive models from data. These models can make predictions and decisions based on new, unseen data.
Data Visualization
Visual representations of data, such as graphs and charts, help in understanding complex information quickly. Data visualization aids in conveying insights effectively.
The Data Science Process
Problem Definition
The data science process begins with understanding the problem you want to solve and defining clear objectives.
Data Collection and Understanding
Collect relevant data and understand its context. This step is crucial as the quality of the analysis depends on the quality of the data.
Data Preparation
Clean, preprocess, and transform the data into a suitable format for analysis. This step ensures that the data is accurate and ready for modeling.
Model Building
Select appropriate algorithms and build predictive models using machine learning techniques. This step involves training and fine-tuning the models.
Model Evaluation and Deployment
Evaluate the model's performance using metrics and test datasets. If the model performs well, deploy it for making predictions on new data.
Technologies Driving Data Science
Programming Languages
Languages like Python and R are widely used in data science due to their extensive libraries and versatility.
Machine Learning Libraries
Libraries like Scikit-Learn and TensorFlow prov
Introduction of Data Science and Data AnalyticsVrushaliSolanke
Data science involves extracting meaningful insights from raw and structured data using scientific methods, technologies, and algorithms. It is a multidisciplinary field that uses tools to manipulate and analyze large amounts of data to find new and useful information. Data science uses powerful hardware, programming, and efficient algorithms to solve data problems and is the future of artificial intelligence. It involves collecting, preparing, analyzing, visualizing, managing, and preserving large data sets. Examples of data science applications include smart watches and Tesla's use of deep learning for self-driving cars.
District Office of Info and KM - Proposed - by Joel Magnussen - 2004Peter Stinson
The document discusses the potential benefits of improved information sharing and knowledge management. It envisions a future where everyone within an organization has access to all relevant information whenever needed. This would allow for better decision-making, more efficient responses to issues, and continuous learning from past experiences and events. The document outlines several initiatives underway to build an integrated information framework with these goals.
The document discusses data mining and provides definitions and explanations of key concepts. It defines data mining as the process of discovering patterns in large data sets involving methods from statistics, machine learning, and database systems. It describes the main components of data mining as including classification, association rule learning, and clustering. Examples of real-world applications are also given such as market basket analysis, fraud detection, and scientific research.
This document provides an overview of data mining. It defines data mining as the process of discovering novel and useful patterns from large amounts of data. The document outlines the main components of data mining, distinguishing it from regular data analysis. It also discusses the data mining process, major data mining techniques like classification and clustering, sources of data, challenges, and advantages. The goal of data mining is to extract useful knowledge from vast amounts of data.
The document discusses how machine learning and natural language processing can be used to analyze large amounts of text data from sources like emails in order to identify patterns and predict future behaviors. It notes that while human language contains a lot of useful information, it is also messy and ambiguous. The document proposes using techniques like machine learning algorithms, statistical models, and human analysis of key risk indicators to help reduce noise and increase the meaningful signals that can be extracted from text data.
Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies. Mining large amounts of structured and unstructured data to identify patterns can help an organization rein in costs, increase efficiencies, recognize new market opportunities and increase the organization's competitive advantage.
The document introduces data mining and knowledge discovery in databases. It discusses why data mining is needed due to large datasets that cannot be analyzed manually. It also covers the data mining process, common data mining techniques like association rules and decision trees, applications of data mining in various domains, and some popular data mining tools.
This document provides an overview of data mining and knowledge discovery in databases. It discusses why data mining is needed due to large volumes of data, describes the data mining process including data preparation, transformation, mining methods and model evaluation. Specific data mining techniques discussed include association rule mining to find frequent patterns in transactional data and decision tree learning as a supervised learning method to classify instances.
This document provides an overview of big data by exploring its definition, origins, characteristics and applications. It defines big data as large data sets that cannot be processed by traditional software tools due to size and complexity. The creator of big data is identified as Doug Laney who in 2001 defined the 3Vs of big data - volume, velocity and variety. A variety of sectors are discussed where big data is used including social media, science, retail and government. The document concludes by stating we are in the age of big data due to new capabilities to analyze large data sets quickly and cost effectively.
This document provides an overview of big data by exploring its definition, origins, characteristics and applications. It defines big data as large datasets that cannot be processed by traditional software tools due to size and complexity. The document traces the development of big data to the early 2000s and identifies the 3 V's of big data as volume, velocity and variety. It also discusses how big data is classified and the technologies used to analyze it. Finally, the document provides examples of domains where big data is utilized, such as social media, science, and retail, before concluding on the revolutionary potential of big data.
October 29, 2019, I was invited to present the keynote of the LegalTech Alliance meeting on eDiscovery and Big Data, in which 11 law departments from the Universities of Applied Sciences in the Netherlands participate.
eDiscovery is more and more important than ever. Future legal professionals must be able to deal with large electronic data sets so they can:
- Take decisions based on facts and not based on guesses and assumptions;
- Answer information requests timely, accurately and complete;
- Avoid high cost, reputation damage, regulatory measures, business disruption and stress!
It is great that the LegalTech Alliance understands that need and that they embed eDiscovery in their educational programs.
Attached are slides of the workshop were we presented the course eDiscovery (including the hands-on with ZyLAB) which we developed together with the University of Applied Sciences in Amsterdam
October 29, 2019, I was invited to present the keynote of the LegalTech Alliance meeting on eDiscovery and Big Data, in which 11 law departments from the Universities of Applied Sciences in the Netherlands participate.
eDiscovery is more and more important than ever. Future legal professionals must be able to deal with large electronic data sets so they can:
- Take decisions based on facts and not based on guesses and assumptions;
- Answer information requests timely, accurately and complete;
- Avoid high cost, reputation damage, regulatory measures, business disruption and stress!
It is great that the LegalTech Alliance understands that need and that they embed eDiscovery in their educational programs.
Text mining scholtes - big data congress utrecht 2019jcscholtes
Wednesday September 18, for the second year in a row, I presented at the Big data Expo (#BigDataExpoNL) in Utrecht, the Netherlands on Text-Mining and how it can be used for big-data analytics on unstructured data, in particular for legal fact-finding missions and GDPR/AVG compliance use cases. Large crowds! Very successful event, good to see that big-data is such a hot topic these days!
Target-Based Sentiment Anaysis as a Sequence-Tagging Taskjcscholtes
November 2019, Zoe Gerolemou successfully presented our paper on Target-Based Sentiment Analysis as a Sequence-Tagging Task at the Benelux Artificial Intelligence Conference (BNAIC 2019). In this research, we were not only able to detect sentiments with very high confidence, but also to determine WHO expressed this sentiments and about WHAT. Many great questions and several other very interesting presentations in the NLP session.
Ai and applications in the legal domain studium generale maastricht 20191101jcscholtes
November 20, 2019, it was my great pleasure to present a special lecture on Artificial Intelligence and Application in the Legal Domain. In this lecture I discuss how the development of machines that can learn, reason and act intelligently – Artificial Intelligence (AI) – is advancing rapidly in the legal domain. In some areas, machine intelligence have even already surpassed the limits of what the brightest human minds are capable of achieving, especially in the field of eDiscovery and Legal Review of large data set.
In others, machines still struggle with seemingly basic tasks. Nonetheless, breakthroughs in AI already have profound impact on the legal profession. AI is set to improve our world now and will continue to do so in the future. At the same time, there is the fear of losing control.
This lecture was part of a larger series on AI organized by our department of data science and knowledge engineering: https://www.maastrichtuniversity.nl/events/artificial-intelligence.
More information can be found here: https://textmining.nu
Augmented intelligence and the impact on your world in 2030jcscholtes
Technologische innovaties gaan tegenwoordig met een exponentiele groei, mensen zijn lineair en kunnen die snelle veranderingen niet aan. Dit leidt tot steeds meer maatschappelijk ongemak en ongenoegen. Daarom zal onze maatschappij geen andere keuze hebben dan een figuurlijk hek te zetten rond al deze nieuwe technologische mogelijkheden. Toekomstige innovaties uit de wereld van de Artificial Intelligence zullen hier rekening mee moeten houden. Dit is wat we Augmented Intelligence noemen.
In deze korte presentatie hebik 4 juni 2019 eenpresentatie gegeven bij Active Professionals in Rotterdam. Hierin heb ik uitgelegd waar deze snelle veranderingen ineens vandaan komen en welke aanvullende reguleringen en veranderingen we voor 2030 kunnen verwachten.
Text mining voor Business Intelligence toepassingenjcscholtes
Text Mining voor Business Intelligence
Voor de meeste Business Intelligence specialisten is het vakgebied van data mining bekender dan dat van text mining. Een goed voorbeeld van data mining is het analyseren van transactie gegevens die in relationele databases zitten. Denk aan omzet gegevens van concurrenten of financiële transacties van klanten. Tekst is vaak een stuk lastiger om mee te werken door de verschillende formaten, ambiguïteit, inconsistentie en fouten.
Echter, steeds meer informatie is ongestructureerde informatie in de vorm van tekst. Slechts een beperkte hoeveelheid informatie is opgeslagen in een gestructureerd formaat in een database. Denk aan sociale media, internet forums, websites, blogs of intranet (MS-SharePoint sites). Daarin zoeken of analyses maken met traditionele database- of data mining technieken is onmogelijk. Deze werken namelijk alleen op gestructureerde informatie.
Daarom richt het vakgebied van de text mining zich op het ontwikkelen van diverse geavanceerde wiskundige-, statistische-, taalkundige- en patroonherkenning technieken waarmee het mogelijk is om ongestructureerde informatie automatisch te structureren en analyseren alsmede om hoge kwaliteit en relevante gegevens te extraheren en de tekst in zijn geheel daardoor beter doorzoekbaar te maken.
Hoge kwaliteit refereert hier in het bijzonder aan de combinatie van relevantie (oftewel: de speld in de hooiberg vinden) en het verkrijgen van nieuwe interessante inzichten.
Deze nieuwe technieken hebben al een grote impact gehad binnen de wereld van opsporings- en inlichtingen diensten, maar ook binnen het juridische en financiële domein. In deze lezing zal worden toegelicht hoe de Business Intelligence toepassingen kunnen profiteren van deze technieken bij het verzamelen van waardevolle inzichten uit open source informatie of het meten van sentimenten en emoties over producten, diensten of bedrijven op sociale media of internet forums.
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
How can text-mining leverage developments in Deep Learning?
Text-mining focusses primary on extracting complex patterns from unstructured electronic data sets and applying machine learning for document classification. During the last decade, a generation of efficient and successful algorithms has been developed using bag-of-words models to represent document content and statistical and geometrical machine learning algorithms such as Conditional Random Fields and Support Vector Machines. These algorithms require relatively little training data and are fast on modern hardware. However, performance seems to be stuck around 90% F1 values.
In computer vision, deep learning has shown great success where the 90% barrier has been broken in many application. In addition, deep learning also shows new successes for transfer learning and self-learning such as reinforcement leaning. Dedicated hardware helped us to overcome computational challenges and methods such as training data augmentation solved the need for unrealistically large data sets.
So, it would make sense to apply deep learning also on textual data as well. But how do we represent textual data: there are many different methods for word embeddings and as many deep learning architectures. Training data augmentation, transfer learning and reinforcement leaning are not fully defined for textual data.
As part of the Haagse Hogeschool Law Program a "Legal Analytics" course was organized early 2019. ZyLAB was welcomed for a brief session on e-discovery for our international law students. Attached are the slides.
Op 21 maart vond de officiële opening van het Legal Tech Lab plaats. Alle partners gaven samen met docenten en studenten een workshop verzorgen tijdens deze middag. Hieronder het programma met een overzicht van de workshops. Bijgaand de eDiscovery presentative van ZyLAB
Big Data en Data Science en de Rechtspraakjcscholtes
In 2018 hebben Prof. dr. H.J. van den Herik van het Leiden Centre of Data Science (LCDS) samen met prof dr. ir. J.C. Scholtes van de Universiteit Maastricht/ZyLAB in samenwerking met ZyLAB trainingen gegeven waarbij meerdere groepen van 20 rechters hands-on ervaring op hebben gedaan met Data Science en Big data software.
De focus van de training lag op beslissingsondersteunende technologie voor de rechtspraak, waarbij gebruik gemaakt werd van ideeën en concepten uit de wereld van Big Data en Data Science.
Omdat veel data van de Raad voor de Rechtspraak ongestructureerd (tekstueel) van aard is, is in een viertal sessie uitgelegd hoe de rechtspraak gebruik kan maken van dit soort technieken bij de ondersteuning van de rechtelijke macht.
How can Artificial Intelligence help me on the Battlefield?jcscholtes
April 26, 2019, I was asked to present how Artificial Intelligence can help the Battlefield at the officers of the 11th Airmobile Brigade (11e Luchtmobiele brigade in Dutch) of the Dutch forces . The potential benefit of Artificial Intelligence on the battlefield is a very interesting, but also intriguing topic! Here you can find my slides. I also have written a blog on this topic which contains several additional references and can be found as a LinkedIn Article and as blog on www.textmining.nu.
Big data analytics for legal fact findingjcscholtes
This eLaw presentation was done at Thursday March 14 I at café ‘t Keizertje, Kaiserstraat in Leiden. The presentation contains a brief overview of big data analytics for legal fact finding.
Text mining scholtes - big data congress utrecht 2018jcscholtes
Veel informatie zitten tegenwoordig niet meer in databases, maar in email collecties, bestanden op harde schijven of in systemen als SharePoint. Het is lastig om dat soort gegevens te gebruiken voor Big Data analyses, terwijl dit vaak toch de meest informatie is. Leer hoe dit met Text-mining wel mogelijk is.
Het vakgebied van data mining is bekender dan dat van text mining. Een goed voorbeeld van data mining is het analyseren van transactie gegevens die in relationele databases zitten. Denk aan creditcard betalingen of pin-transacties.
Echter, vooral in het juridische domein is meer dan 90% van alle informatie ongestructureerde informatie. Slechts een beperkte hoeveelheid informatie is opgeslagen in een gestructureerd formaat in een database. De meeste informatie waar we dagelijks mee werken staat in tekst documenten, e-mails, of in multimediale (spraak, video, en foto’s) bestanden. Daarin zoeken of analyses maken met database- of data mining technieken is onmogelijk. Deze werken namelijk alleen op gestructureerde informatie.
Het is makkelijker om gestructureerde informatie te doorzoeken, te beheren, te organiseren, te delen en er rapportages mee te maken. Niet alleen voor mensen, maar ook voor computers. Vandaar de wens om ongestructureerde informatie te structureren waarna zowel mensen als computers er beter mee om kunnen gaan èn omdat we dan ook ons bekende technieken en methodieken kunnen gebruiken.
Daarom richt het vakgebied van de text mining zich op het ontwikkelen van diverse geavanceerde wiskundige-, statistische-, taalkundige- en patroonherkenning technieken waarmee het mogelijk is om ongestructureerde informatie automatisch te analyseren alsmede om hoge kwaliteit en relevante gegevens te extraheren en de tekst in zijn geheel daardoor beter doorzoekbaar te maken.
Hoge kwaliteit refereert hier in het bijzonder aan de combinatie van relevantie (oftewel: de speld in de hooiberg vinden) en het verkrijgen van nieuwe interessante inzichten.
Met behulp van text mining technieken kunnen we in plaats van zoeken op woorden, zoeken op taalkundige patronen van woorden, dit is dus zoeken en analyseren op een hoger niveau!
Deze nieuwe technieken hebben al een ongelooflijke impact op juridisch gebied: denk aan het voorbereiden van M&A trajecten, compliance met de AVG-GDPR en het uitvoeren van grote gegevens onderzoeken voor rechtszaken, arbitrage, verzoeken van toezichthouders of andere juridische domeinen.
Leer in deze lezing hoe text mining deze toch conservatieve vakgebieden ook in snel tempo veranderd en hoe met behulp van slimme technologie beter, sneller en efficiënter gewerkt kan worden en veel van het saaie en eentonige werk naar het verleden verwezen kan worden.
This document summarizes the benefits of using artificial intelligence and machine learning in the legal field. It discusses how AI can more effectively and efficiently analyze large amounts of documents through techniques like natural language processing. The document presents research that found AI outperformed humans in reviewing documents for relevance. It also provides examples of how AI is already being used for contract review, emotion detection, and assisting with investigations. The conclusion is that machine learning can review documents 3-20 times faster than humans and find 20-100% more relevant information, making the process smarter, better, and faster.
Legal Guide for issuing utility tokens in the EULawarton
This guideline is for anyone involved in launching a crypto project in the EU, especially if you're thinking about issuing utility tokens. With the EU’s Markets in Crypto-Assets Regulation (MiCA) coming into full effect in 2025, the regulatory landscape is changing fast. If your project gives users tokens to access your platform or services, it’s time to take MiCA seriously.
pdf Freedom of press a very important slide.pdfiffat91
Press freedom is instrumental to the fulfilment of the human right to freedom of expression, in particular the right to seek, impart and receive information and ideas of all kinds. A free press plays a vital role in holding governments and other powerful actors to account.
How San Diego Courts Handle Custody for Unmarried ParentsAndrson Smith
Dive into this presentation to know how San Diego courts handle custody for unmarried parents and the legal steps involved in securing parental rights. Learn about paternity, custody types, and how courts prioritize the child’s best interests every step of the way.
LEGAL RIGHTS FOR LAW STUDENTS AND ALSO FOR TEACHERS ALL ABOUT LAWYERSayeshakainat555
The legal rights law subject it's types all about it . Classification examples. Types of legal rights. All about education and also teaching. Law #law #education#legal #rights #wrongs #lawyers
M/S Bikaji Foods Int. Ltd vs M/S Desai Brothers Limited & Anr Delhi High Court – October 11, 2023 (A Case on Trademark & Geographical Indication Rights)
Key Dates & Events
August 25, 2016 – Desai Brothers applied for trademark "Pitaara Bikaneri Bhujia".
October 1, 2021 – Trademark application was abandoned after opposition from Bikaji Foods.
October 5, 2023 – First hearing; Defendants accepted court summons.
October 11, 2023 – Major hearing; Court observed packaging similarity & ordered an inventory check.
November 21, 2023 – Local Commissioner appointed to inspect manufacturing premises.
January 30, 2024 – Next scheduled hearing for further arguments & packaging revision review.
Packaging Similarity
Court compared Bikaji and Pitaara Bikaneri Bhujia packets.
Found significant resemblance in color, design, and layout.
2. Manufacturer Transparency Issues
Court found confusion in who manufactures vs. who packs the products.
3. Need for Clear Differentiation
Defendants must submit new packaging to avoid misrepresentation
Brand Protection – Companies must actively protect trademarks.
Legal Compliance – GI and trademark laws must be followed.
For Consumers
Transparency – Clearer branding helps avoid confusion.
For The Legal System
GI & Trademark Laws Evolving – Strengthening consumer protection and brand identity.
The Risks of Delaying KSA PDPL Compliance - Why Early Action MattersPyxos
As Saudi Arabia enforces its Personal Data Protection Law (PDPL), delays come at a high cost including penalties, lost trust, and emergency fixes. In partnership with the Riyadh Chamber of Commerce, this webinar was delivered by our CEO & Founder, James Beriker, along with team members Anurag Sushant and Varun Arora on May 5th.
In it, our team shared the lessons from GDPR, the risks of waiting to comply with KSA's PDPL, and how early action helps organizations save time, reduce costs, and stay in control.
In the second half of the webinar, our team relayed the How to achieve and sustain compliance using our methodology and technology.
AI-Governance-Guidelines - Download the whitepaper now.DaviesParker
How new ai based analytics ignite a productivity revolution in e discovery-final
1. HOW NEW AI-BASED ANALYTICS IGNITE A
PRODUCTIVITY REVOLUTION IN
EDISCOVERY
ACEDS Webinar - August 24th, 2017
2. TODAY’S SPEAKERS
Mary Mack
Executive Director ACEDS
Paul Starrett
Specialist in electronic
evidence and data science in
the legal profession
Johannes Scholtes
CSO at ZyLAB
Professor Text-Mining
University of Maastricht
4. Tools from the field of Artificial Intelligence and Data Science
accelerate truth-finding missions in regulatory requests and
internal investigations.
New AI-based analytics have drastically increased the speed
and improved the quality of the eDiscovery process.
But what exactly are these new AI techniques and how do they
compare to all the other analytics we have been using for
years?
TODAY’S AGENDA
5. THE BUZZ
SLIDE / 5
e-Discovery & Artificial Intelligence The new reality
AI becomes good business practice
6. WHAT ARE WE TALKING ABOUT?
“Analytics” is the discovery,
interpretation, and communication
of meaningful patterns in data.
The terms “analytics” or “analysis”
describe functions ranging from
reporting and review metrics to
sophisticated search and
advanced data, text-mining and
machine learning applications.
Benefits also range across various
dimensions.
“Artificial Intelligence (AI) is a
broad, complex field of research.
AI includes tasks such as
reasoning, problem solving,
knowledge representation,
planning, machine learning,
natural language processing,
perception, motion, social
intelligence, and even creativity.
The ultimate goal is the creation
of some form of general
intelligence.
SLIDE / 6
7. The Usual Suspects:
Exploding data volumes;
New types of data (multi-media, social, BYOD);
Exploding eDiscovery costs;
New regulations and compliance requirements
GDPR
Cyber-security requirements
More enthusiastic regulators, especially outside of the US.
SLIDE / 7
WHY WE SHOULD CARE
8. DEALING WITH THE EDISCOVERY DATA WAVE
In eDiscovery, you never know in
advance:
How much data you will have;
What type of data it will be and thus
what type of processing is required;
What workflow and iterations you will
have;
Automation, AI and Data Science are
very CPU and computers memory
intensive;
So, you need intelligent and extremely
load-balancing and resource allocation to
prevent bottlenecks and deal effectively
with the “Data Wave” in eDiscovery.
9. Better understand your data: the ability to make better strategic
decisions.
Early Case Assessment: build and justify eDiscovery budget,
resources and timelines.
Reduce data volumes: cut through the noise and zero in on
documents of interest.
Take an investigative approach: organize and prioritize documents.
Reduce your eDiscovery cost: improve productivity and precision of
your team.
Better quality: see greater consistency in coding decisions across
similar documents.
Speed up litigation.
SLIDE / 9
WHY ANALYTICS?
10. Humans have cognitive limitations when processing and
deriving insights from large-scale document sets; humans
simply cannot successfully synthesize large volumes of data.
Technology will help lawyers work more efficiently, effectively,
and enjoyably.
Grossman & Cormack* : “TAR was not only more effective than
human review at finding relevant documents, but also much
cheaper … Overall, the myth that exhaustive manual review is
the most effective—and therefore the most defensible—
approach to document review is strongly refuted.”
SLIDE / 10
WHY AI-BASED ANALYTICS?
* TECHNOLOGY-ASSISTED REVIEW IN E-DISCOVERY CAN BE MORE EFFECTIVE AND MORE EFFICIENT THAN EXHAUSTIVE MANUAL REVIEW
By Maura R. Grossman* & Gordon V. Cormack. Richmond Journal of Law and Technology. Vol. XVII, Issue 3.
12. Structural: aka syntactic analytics
File-, Document and Forensic Property extraction, Meta-data
filtering, Saved (full-text) Searches, Email Thread detection,
Email Thread reduction, Missing emails in thread, Duplicate- and
Near Duplicate detection, Language identification,
Communication Analysis, Time-line Visualizations, Geo-mapping,
…
Conceptual: aka semantic or meaning based analytics
Keyword Expansion (taxonomy), Content Clustering, Content-
based Categorization, Conceptual Search, Sentiment & Emotion
Mining, Semantic Content Analysis, Word-Cloud, Topic Modeling,
…
Machine Learning: data driven (predictive) analytics
Technology Assisted Review, Contract clause detection &
classification, Privileged detection, …
SLIDE / 12
WHAT KIND OF ANALYTICS HAVE WE SEEN?
STRUCTURE OF DATA
MEANING OF DATA
LEARN FROM DATA
13. WHAT IS THE RELATION BETWEEN AI AND ANALYTICS?
eDiscovery needs:
Perception
Reading: OCR, handwriting detection, signature
recognition,
Listening: Audio search
Vision: Image classification
Language: Machine Translation
Intelligent Search
Machine Learning for search
Concept Clustering
Data Visualization
Text classification and categorization
Document
Paragraph (clause)
Sentence or phrase
AI provides the algorithms and evaluation methods:
Machine Learning
Decision trees
Support Vector Machines
Deep Learning (CNN)
Topic Modeling / Concept Search
Hierarchical Clustering
LSI
LDA
NMF
Natural Language Processing (NLP)
Shallow Parsing
Deep Parsing
Co-reference resolution
SLIDE / 13
18. PERCEPTION: OCR ON BITMAPS
ZyLAB: people often screenshot or take
pictures from such information, just in case
or to remember…. ZyLAB will pick up such
images, OCR and find them…
19. STRUCTURAL: UNPACK EMBEDDED CONTENT
ZyLAB:
• Every embedded item is extracted and OCR-ed if needed.
• Search & Find
• Show in document family
24. Question Entities or patterns to address this question
Who is it about? PERSON, COMPANY, ORGANIZATION. EMAIL
ADDRESS
What is it about? Result of Topic Modeling and Concept Clustering
When did it happen? DATE, TIME, MONTH, DAY WEEK, YEAR
Where did it happen? ADDRESS, CITY, COUNTRY, CONTINENT,
DEPARTMENT and other geo-locations
Why did it happen? Sentiments, emotions and cursing
How did it happen? Combining entities and facts
How much/often did it happen? Quantitative measures such as amounts,
currencies, and other numbers. Also frequency
and averages on entity occurrences.
SLIDE / 24
25. MORE DETAILED INSIGHTS
SLIDE / 25
More interesting is to combine the W’s. For instance, why
not look for Who is Where, or What happened When.
Who – Who
Who – Why
When – What
26. The era of traditional keyword and Boolean search
seems to be over. Even the most brilliant query results
in too many hits. Reviewing these takes too much
time and resources.
People do not know exactly what to look for, what
keywords to use or how to spell them.
The quality of traditional search is much lower than
the searchers think (80% perceived versus 20-40%
actual quality).
Only highly skilled searchers who manage all
(advanced) query options are able to get close to
80%. Even then, they cannot be sure that they did in
fact found 80% of all relevant documents. This is
another problem measuring recall: you never know
what you miss.
MACHINE LEARNING: THE NEW SEARCH
29. Have we found all relevant
information? How complete
is the data we sent to the
regulator? Machine
learning!
During this process, several
quantitative measures can
be calculated such as
precision, recall, F-values
and precision of the return
set. Based on these
measurements, one can
describe exactly how much
of the relevant information
has been found at which
moment in the process.
HOW CAN WE MEASURE RECALL
34. ZyLAB’s Direct Collecting makes tremendous time savings to get data ready for early
case assessment and (first) pass review. Direct Collection drastically reduces the cost
and risks of downloading / uploading data or the shipping around of tapes and hard disks.
ZyLAB’s Deep Processing allows you to automatically reduce your data volumes before
you send them on for review, without getting in trouble or being accused of data
spoliation. If every component of data is searchable, only then can one use automated
tools to reduce data.
Using ZyLAB’s Review Accelerators you can minimize the most expensive and time
consuming part of the eDiscovery process. TAR, batch tagging, sampling, redaction,
email trails, …
Litigants use ZyLAB’s Early Case Assessment to quickly understand the facts and
merits of a case, identify key custodians and recognize critical information so they can
develop an effective and realistic litigation strategy.
SLIDE / 34
BENEFITS TO IN-HOUSE COUNSEL
35. BENEFITS TO LAW FIRMS
ZyLAB covers multiple eDiscovery use
cases. One platform: More cases, more
volume, better pricing.
No need to involve any 3rd parties.
Bill the hours for project management and
data science (machine learning) as well.
DIY: upload data and almost immediately
start reviewing with your team and bill the
hours.
Find out what really happened with
ZyLAB’s deep search and analytics.
Expand review team.
Replace the bottom of the traditional
earnings pyramid with “review robots”:
make more margin.
Be more competitive.
Do more work with your current team:
never have to pass on new opportunities
because of capacity problems.
less risk of errors and missing out on key
issues. So, less risk for liability claims and
higher insurance premiums.
37. “ZYLAB TAKES CARE OF THE PROCESS, SUPPORTS THE LAWYER BY
THINKING COMMERCIALLY AND PROVIDES COMFORT WITH THE
USE OF ADVANCED TECHNOLOGY”
Ruben Elkerbout, anti-trust lawyer and partner with Stek Lawyers