How new ai based analytics ignite a productivity revolution in e discovery-final

HOW NEW AI-BASED ANALYTICS IGNITE A
PRODUCTIVITY REVOLUTION IN
EDISCOVERY
ACEDS Webinar - August 24th, 2017

TODAY’S SPEAKERS
Mary Mack
Executive Director ACEDS
Paul Starrett
Specialist in electronic
evidence and data science in
the legal profession
Johannes Scholtes
CSO at ZyLAB
Professor Text-Mining
University of Maastricht

 Tools from the field of Artificial Intelligence and Data Science
accelerate truth-finding missions in regulatory requests and
internal investigations.
 New AI-based analytics have drastically increased the speed
and improved the quality of the eDiscovery process.
 But what exactly are these new AI techniques and how do they
compare to all the other analytics we have been using for
years?
TODAY’S AGENDA

THE BUZZ
SLIDE / 5
e-Discovery & Artificial Intelligence The new reality
AI becomes good business practice

WHAT ARE WE TALKING ABOUT?
“Analytics” is the discovery,
interpretation, and communication
of meaningful patterns in data.
The terms “analytics” or “analysis”
describe functions ranging from
reporting and review metrics to
sophisticated search and
advanced data, text-mining and
machine learning applications.
Benefits also range across various
dimensions.
“Artificial Intelligence (AI) is a
broad, complex field of research.
AI includes tasks such as
reasoning, problem solving,
knowledge representation,
planning, machine learning,
natural language processing,
perception, motion, social
intelligence, and even creativity.
The ultimate goal is the creation
of some form of general
intelligence.
SLIDE / 6

The Usual Suspects:
 Exploding data volumes;
 New types of data (multi-media, social, BYOD);
 Exploding eDiscovery costs;
 New regulations and compliance requirements
 GDPR
 Cyber-security requirements
 More enthusiastic regulators, especially outside of the US.
SLIDE / 7
WHY WE SHOULD CARE

DEALING WITH THE EDISCOVERY DATA WAVE
In eDiscovery, you never know in
advance:
 How much data you will have;
 What type of data it will be and thus
what type of processing is required;
 What workflow and iterations you will
have;
 Automation, AI and Data Science are
very CPU and computers memory
intensive;
So, you need intelligent and extremely
load-balancing and resource allocation to
prevent bottlenecks and deal effectively
with the “Data Wave” in eDiscovery.

 Better understand your data: the ability to make better strategic
decisions.
 Early Case Assessment: build and justify eDiscovery budget,
resources and timelines.
 Reduce data volumes: cut through the noise and zero in on
documents of interest.
 Take an investigative approach: organize and prioritize documents.
 Reduce your eDiscovery cost: improve productivity and precision of
your team.
 Better quality: see greater consistency in coding decisions across
similar documents.
 Speed up litigation.
SLIDE / 9
WHY ANALYTICS?

 Humans have cognitive limitations when processing and
deriving insights from large-scale document sets; humans
simply cannot successfully synthesize large volumes of data.
 Technology will help lawyers work more efficiently, effectively,
and enjoyably.
 Grossman & Cormack* : “TAR was not only more effective than
human review at finding relevant documents, but also much
cheaper … Overall, the myth that exhaustive manual review is
the most effective—and therefore the most defensible—
approach to document review is strongly refuted.”
SLIDE / 10
WHY AI-BASED ANALYTICS?
* TECHNOLOGY-ASSISTED REVIEW IN E-DISCOVERY CAN BE MORE EFFECTIVE AND MORE EFFICIENT THAN EXHAUSTIVE MANUAL REVIEW
By Maura R. Grossman* & Gordon V. Cormack. Richmond Journal of Law and Technology. Vol. XVII, Issue 3.

 Structural: aka syntactic analytics
 File-, Document and Forensic Property extraction, Meta-data
filtering, Saved (full-text) Searches, Email Thread detection,
Email Thread reduction, Missing emails in thread, Duplicate- and
Near Duplicate detection, Language identification,
Communication Analysis, Time-line Visualizations, Geo-mapping,
…
 Conceptual: aka semantic or meaning based analytics
 Keyword Expansion (taxonomy), Content Clustering, Content-
based Categorization, Conceptual Search, Sentiment & Emotion
Mining, Semantic Content Analysis, Word-Cloud, Topic Modeling,
…
 Machine Learning: data driven (predictive) analytics
 Technology Assisted Review, Contract clause detection &
classification, Privileged detection, …
SLIDE / 12
WHAT KIND OF ANALYTICS HAVE WE SEEN?
STRUCTURE OF DATA
MEANING OF DATA
LEARN FROM DATA

WHAT IS THE RELATION BETWEEN AI AND ANALYTICS?
eDiscovery needs:
 Perception
 Reading: OCR, handwriting detection, signature
recognition,
 Listening: Audio search
 Vision: Image classification
 Language: Machine Translation
 Intelligent Search
 Machine Learning for search
 Concept Clustering
 Data Visualization
 Text classification and categorization
 Document
 Paragraph (clause)
 Sentence or phrase
AI provides the algorithms and evaluation methods:
 Machine Learning
 Decision trees
 Support Vector Machines
 Deep Learning (CNN)
 Topic Modeling / Concept Search
 Hierarchical Clustering
 LSI
 LDA
 NMF
 Natural Language Processing (NLP)
 Shallow Parsing
 Deep Parsing
 Co-reference resolution
SLIDE / 13

PERCEPTION: AUDIO SEARCH
ZyLAB: automatic Audio
Search on all detected
(embedded) audio and
video files.

ZyLAB: embedded
machine translation
on every (embedded)
document or
document section.
PERCEPTION: MACHINE TRANSLATION

SLIDE / 16
PERCEPTION: HANDWRITING & SIGNATURE DETECTION (R&D)

SLIDE / 17
PERCEPTION: VISUAL CLASSIFICATION OF IMAGES FOR
EDISCOVERY (R&D)

PERCEPTION: OCR ON BITMAPS
ZyLAB: people often screenshot or take
pictures from such information, just in case
or to remember…. ZyLAB will pick up such
images, OCR and find them…

STRUCTURAL: UNPACK EMBEDDED CONTENT
ZyLAB:
• Every embedded item is extracted and OCR-ed if needed.
• Search & Find
• Show in document family

STRUCTURAL: ONE-ON-ONE COMMUNICATION

STRUCTURAL: MISSING EMAIL IN THREAD
ZyLAB:
 Identify gaps in
collected emails
 Compare gaps among
suspects
 Restore email from
backup’s

CONCEPTUAL: SEMANTICS AND SENTIMENTS

FIND EVEN WHEN YOU DO NOT KNOW WHAT TO LOOK FOR

Question Entities or patterns to address this question
Who is it about? PERSON, COMPANY, ORGANIZATION. EMAIL
ADDRESS
What is it about? Result of Topic Modeling and Concept Clustering
When did it happen? DATE, TIME, MONTH, DAY WEEK, YEAR
Where did it happen? ADDRESS, CITY, COUNTRY, CONTINENT,
DEPARTMENT and other geo-locations
Why did it happen? Sentiments, emotions and cursing
How did it happen? Combining entities and facts
How much/often did it happen? Quantitative measures such as amounts,
currencies, and other numbers. Also frequency
and averages on entity occurrences.
SLIDE / 24

MORE DETAILED INSIGHTS
SLIDE / 25
More interesting is to combine the W’s. For instance, why
not look for Who is Where, or What happened When.
Who – Who
Who – Why
When – What

The era of traditional keyword and Boolean search
seems to be over. Even the most brilliant query results
in too many hits. Reviewing these takes too much
time and resources.
 People do not know exactly what to look for, what
keywords to use or how to spell them.
 The quality of traditional search is much lower than
the searchers think (80% perceived versus 20-40%
actual quality).
 Only highly skilled searchers who manage all
(advanced) query options are able to get close to
80%. Even then, they cannot be sure that they did in
fact found 80% of all relevant documents. This is
another problem measuring recall: you never know
what you miss.
MACHINE LEARNING: THE NEW SEARCH

How new ai based analytics ignite a productivity revolution in e discovery-final

 Document Classification (TAR)
 Find responsive documents
 Boost recall
 Measure recall
 Paragraph Classification
 Privileged review
 Document clause classification
 Contract clause classification
 GDPR – Privacy detection – Redaction – Pseudomization
SLIDE / 28
DIFFERENT USE CASES OF MACHINE LEARNING

 Have we found all relevant
information? How complete
is the data we sent to the
regulator? Machine
learning!
 During this process, several
quantitative measures can
be calculated such as
precision, recall, F-values
and precision of the return
set. Based on these
measurements, one can
describe exactly how much
of the relevant information
has been found at which
moment in the process.
HOW CAN WE MEASURE RECALL

0
200
400
600
800
1000
1200
1400
1600
ZyLAB Assisted Review Manual Review
Hours
MACHINE LEARNING
 15-20 faster than manual review
 10-20% more accurate, fully defensible

 Privileged
information:
automatically identify
communications with
our lawyers.
 PII, PHI, and GDPR:
redaction and
pseudonymization

CLAUSE DETECTION
Detailed reporting
on content of
contracts, Reporting
on extraction of key
information, Higher
precision search

 ZyLAB’s Direct Collecting makes tremendous time savings to get data ready for early
case assessment and (first) pass review. Direct Collection drastically reduces the cost
and risks of downloading / uploading data or the shipping around of tapes and hard disks.
 ZyLAB’s Deep Processing allows you to automatically reduce your data volumes before
you send them on for review, without getting in trouble or being accused of data
spoliation. If every component of data is searchable, only then can one use automated
tools to reduce data.
 Using ZyLAB’s Review Accelerators you can minimize the most expensive and time
consuming part of the eDiscovery process. TAR, batch tagging, sampling, redaction,
email trails, …
 Litigants use ZyLAB’s Early Case Assessment to quickly understand the facts and
merits of a case, identify key custodians and recognize critical information so they can
develop an effective and realistic litigation strategy.
SLIDE / 34
BENEFITS TO IN-HOUSE COUNSEL

BENEFITS TO LAW FIRMS
 ZyLAB covers multiple eDiscovery use
cases. One platform: More cases, more
volume, better pricing.
 No need to involve any 3rd parties.
 Bill the hours for project management and
data science (machine learning) as well.
 DIY: upload data and almost immediately
start reviewing with your team and bill the
hours.
 Find out what really happened with
ZyLAB’s deep search and analytics.
Expand review team.
 Replace the bottom of the traditional
earnings pyramid with “review robots”:
make more margin.
 Be more competitive.
 Do more work with your current team:
never have to pass on new opportunities
because of capacity problems.
 less risk of errors and missing out on key
issues. So, less risk for liability claims and
higher insurance premiums.

“ZYLAB TAKES CARE OF THE PROCESS, SUPPORTS THE LAWYER BY
THINKING COMMERCIALLY AND PROVIDES COMFORT WITH THE
USE OF ADVANCED TECHNOLOGY”
Ruben Elkerbout, anti-trust lawyer and partner with Stek Lawyers

MORE READING – WWW.ZYLAB.COM/RESOURCES/EBOOKS/

Q&A
MORE INFORMATION: WWW.ZYLAB.COM
39
More ZyLAB Webinars and events:
https://zylab.com/company/event-calendar/

How new ai based analytics ignite a productivity revolution in e discovery-final

Recommended

More Related Content

What's hot (20)

Similar to How new ai based analytics ignite a productivity revolution in e discovery-final (20)

More from jcscholtes (15)

Recently uploaded (20)

How new ai based analytics ignite a productivity revolution in e discovery-final