SlideShare a Scribd company logo
Possibilities and limitations
of AI-boosted multi-
categorization for patents,
scientific literature, and
web
AI
Methodology Optimisation Automation Analysis & Synthesis
2015
2016
2017
2018
2019
7 Years of Business Intelligence Developpment
2020
2021
Climbing on the Matterhorn
The everyday use of AI-driven algorithms for
data search, analysis and synthesis comes with
important time savings
but also reveals the
need to understand and accept
the limitations of the technology
A workshop report
Image: geniusgadget.com
HUMAN
INTELLIGENCE
+
ARTIFICIAL
INTELLIGENCE
=
AUGMENTED
INTELLIGENCE
Prepare the case studies by exposing the possibilities and limits of the AI-
assisted automatic categorization process.
Discuss the challenges faced in setting up this process:
• Definition of the trainingset (type of data to be processed, Patent or NPL or both)
• Development of classifiers (single vs multi, selected fields, margin of error to be defined)
• Volume handled: > 300,000
Process Advantage:
• Collaboration with experts in the field
• Multi categorization
• Ability to select the fields to analyze
• Combine AI classification tool with collaborative monitoring tool – take the best of two worlds
Restitution of results in various forms with possible developments on demand
Monitor
oDifferent types of data to process (patent, NPL, web, internal documents)
oIncreasing volume of information to monitor
oMultiple data sources to consult
oLimited time and resources
How to
o Process this ever increasing flow of data without devoting too much time and resources ?
o Boost customer efficiency and bring customer expertise where it is most valuable?
Automate
o Automate the monitoring process from end to end
o Optimize the data classification process by integrating AI
Automate
o Provide a data selection and classification accuracy close to an expert work with
higher stability than humans
o Save time and resources
o Process quickly and efficiently large volumes of data on a regular basis
Import Result
AI classification
Input:
Patent, NPL, Web,
internal documents
Output:
RAPID, export,
synchronisation
Free yourself from doing repetitive tasks
Focus on what’s most matter: the result
SmartCat
SmartCat
Powered by
• Averbis
Integrated in
• RAPID
Designed to
• Process all types of data
• Handle large volumes of data
Empower you to
• Detect relevant documents
• Apply single or multi-label classifications
5.Run the classification process
6.Validate the AI classification
3.Run the learning process
4.Validate the prediction model
1.Provide a training set
2.Set the AI classifier
Key during the definition
and validation steps
Expert
contribution
Classification
• Balanced set
• Unambiguous classification
• Distinctive categories
Trainingset
• Field selection
• Classification mode: Single VS Multi
Classifier
• Metrics validation
Prediction
model
• Classification assessment
• Relevance labels assigned
o Precision
o Recall
o F1 score
Precision Recall F1-Score
1 1 1.00
0.5 0.5 0.50
0.9 0.5 0.64
0.9 0.9 0.90
0.8 0.8 0.80
0.7 0.9 0.79
0.1 0.9 0.18
0.2 0.9 0.33
0.3 0.8 0.44
0.4 0.8 0.53
0.5 0.8 0.62
0.6 0.9 0.72
0.7 0.9 0.79
0.8 0.9 0.85
0.9 0.9 0.90
1 1 1.00
1 1 1.00
1 1 1.00
1 1 1.00
1 1 1.00
0
0,2
0,4
0,6
0,8
1
1,2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Precision Recall F1-Score
Precision of a classifier: Ratio of good documents in a category
Recall of a classifier: Ratio of relevant documents in a category
F1-Score of a classifier: Combination of Precision and Recall
Depends on
o Thematic
o Data quality
o Classification uncertainties
and complexity
Contributes to
o Subject matter expert(s)
o Unambiguous and distinctive
classification
o Delimited search scope
What we intended to do (and some times managed to do)
Raw data One classifier Final result
What we finally did
Raw
data
Binary
classifier
Classifier #1
Classifier #2
Classifier #3
…
Bad
Result #1
Result #2
Result #3
…
Good Final result
Relevance rate estimated for each of the
3 monitoring processes implemented
Number of iterations done before
reaching a suitable relevance rate
Time to multi-classify 1000 documents
>80%
~3
4 min
Fully automated process
hosted in one place
Experts focus on the result
Patent, NPL, Web, internal documents
Import
Classification
Restitution
SmartCat
We did it !
Automated data upload Classification result
SmartCat
AI classification
Expert reviews
Weekly updates Expert evaluation
User communication
AI training based on
expert feedback
Case Study No 1: «enough time, no focus»
Major hurdles Overcome by
Implement a flexible and easy-to-use process Developping RAPID in collaboration with reknown experts in the field
Ambiguities or uncertainties when defining the
classification and the trainingset
Providing reliable definition and selection
Assess the classification quality Involving motivated experts
Shift noted from the initial request Redefining the classification in agreement with the experts involved
Synchronise data between RAPID and PS Setting an automated workflow compatible with RAPID and PS
Reliability control Real time monitoring every step of the automated process
Case Study No 1: «enough time, no focus»
Set-up
oChose a sufficiently large monitoring strategy for the alert
(Criteria: find all the existing documents under observation or with oppositions)
oTrain a classifier with all observation and opposition cases and the same quantity of
clearly non-relevant documents
oTake two month of monitoring data → 4’600 newly published documents
oConfigure SmartCat: 5 certainly relevant documents, 6 probably relevant documents and
62 potentially relevant documents
oCheck these 11 documents with Central IP → Yes, they are relevant.
Case Study No 2: «no time, no monitoring»
Set-up
0 500 1000 1500 2000 2500 3000
Non relevant – very sure
Non relevant – sure
Non relevant – not sure
Relevant – not sure
Relevant – sure
Relevant – very sure 5
6
62
601
909
2823
Effect of additional training cycles
Case Study No 2: «no time, no monitoring»
Climbing on the Matterhorn
1. Establish a good training set
2. Configure the classifier system carefully
3. Don’t despair when your first attempt(s)
fail(s)
4. Take a good guide
5. Study the AI-System carefully, identify
the gradients of convergence
6. Repeat steps 1-5 in cycles until you…
7. Reach the summit
8. Enjoy the view !
9. Be aware that every mountain is
different
From the
data lake
To the key
document
The Project Team
Jean-Baptiste Porier
Senior Data
Analyst
David Borel
Head of
Foresight Team
Harald Jenny
CEO
The time for AI implementation is now.
JACQUET DROZ 1
2002 NEUCHÂTEL
WWW.CENTREDOC.SWISS
INFO@CENTREDOC.CH
+41 32 720 51 31

More Related Content

PDF
5 Practical Steps to a Successful Deep Learning Research
Brodmann17
 
PDF
Advanced Project Data Analytics for Improved Project Delivery
Mark Constable
 
PPTX
Managing Data Science Projects
Danielle Dean
 
PPTX
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Kevin Mader
 
PPTX
AI-SDV 2020: Kairntech
Dr. Haxel Consult
 
PDF
Building successful and secure products with AI and ML
Simon Lia-Jonassen
 
PPTX
Machine Learning & Artificial Intelligence - Machine Controlled Data Dispensa...
STePINForum
 
PPTX
1) Introduction to Data Analyticszz.pptx
PrajwalAuti
 
5 Practical Steps to a Successful Deep Learning Research
Brodmann17
 
Advanced Project Data Analytics for Improved Project Delivery
Mark Constable
 
Managing Data Science Projects
Danielle Dean
 
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Kevin Mader
 
AI-SDV 2020: Kairntech
Dr. Haxel Consult
 
Building successful and secure products with AI and ML
Simon Lia-Jonassen
 
Machine Learning & Artificial Intelligence - Machine Controlled Data Dispensa...
STePINForum
 
1) Introduction to Data Analyticszz.pptx
PrajwalAuti
 

Similar to AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization for patents, scientific literature, and web Harald Jenny (CENTREDOC, CH) (20)

PDF
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Ali Alkan
 
PDF
Data Science Training and Placement
AkhilGGM
 
PPTX
Best Selenium certification course
KumarNaik21
 
PPTX
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Neotys_Partner
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PPTX
Data Connectors San Antonio Cybersecurity Conference 2018
Interset
 
PDF
Data science training in hyd ppt converted (1)
SayyedYusufali
 
PDF
Data science training in hyd pdf converted (1)
SayyedYusufali
 
PDF
Data science training in hydpdf converted (1)
SayyedYusufali
 
PPTX
Which institute is best for data science?
DIGITALSAI1
 
PPTX
Best Selenium certification course
KumarNaik21
 
PPTX
Data science training in hyd ppt (1)
SayyedYusufali
 
PPTX
Data science training institute in hyderabad
VamsiNihal
 
PPTX
Data science training in Hyderabad
saitejavella
 
PPTX
Data science training Hyderabad
Nithinsunil1
 
PPTX
Data science online training in hyderabad
VamsiNihal
 
PPTX
Data science training in hyd ppt (1)
SayyedYusufali
 
PPTX
data science training and placement
SaiprasadVella
 
PPTX
online data science training
DIGITALSAI1
 
PPTX
Data science online training in hyderabad
VamsiNihal
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Ali Alkan
 
Data Science Training and Placement
AkhilGGM
 
Best Selenium certification course
KumarNaik21
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Neotys_Partner
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Data Connectors San Antonio Cybersecurity Conference 2018
Interset
 
Data science training in hyd ppt converted (1)
SayyedYusufali
 
Data science training in hyd pdf converted (1)
SayyedYusufali
 
Data science training in hydpdf converted (1)
SayyedYusufali
 
Which institute is best for data science?
DIGITALSAI1
 
Best Selenium certification course
KumarNaik21
 
Data science training in hyd ppt (1)
SayyedYusufali
 
Data science training institute in hyderabad
VamsiNihal
 
Data science training in Hyderabad
saitejavella
 
Data science training Hyderabad
Nithinsunil1
 
Data science online training in hyderabad
VamsiNihal
 
Data science training in hyd ppt (1)
SayyedYusufali
 
data science training and placement
SaiprasadVella
 
online data science training
DIGITALSAI1
 
Data science online training in hyderabad
VamsiNihal
 
Ad

More from Dr. Haxel Consult (20)

PDF
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Copyright Clearance Center
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Lighthouse IP
Dr. Haxel Consult
 
PDF
AI-SDV 2022: New Product Introductions: CENTREDOC
Dr. Haxel Consult
 
PDF
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
Dr. Haxel Consult
 
PDF
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
Dr. Haxel Consult
 
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
Dr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
Dr. Haxel Consult
 
AI-SDV 2022: Lighthouse IP
Dr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
Dr. Haxel Consult
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
Dr. Haxel Consult
 
Ad

Recently uploaded (20)

PPTX
ENCOR_Chapter_11 - ‌BGP implementation.pptx
nshg93
 
PPT
Transformaciones de las funciones elementales.ppt
rirosel211
 
PDF
Latest Scam Shocking the USA in 2025.pdf
onlinescamreport4
 
PPTX
dns domain name system history work.pptx
MUHAMMADKAVISHSHABAN
 
PPTX
办理方法西班牙假毕业证蒙德拉贡大学成绩单MULetter文凭样本
xxxihn4u
 
PDF
DNSSEC Made Easy, presented at PHNOG 2025
APNIC
 
PPTX
原版北不列颠哥伦比亚大学毕业证文凭UNBC成绩单2025年新版在线制作学位证书
e7nw4o4
 
PPT
Introduction to dns domain name syst.ppt
MUHAMMADKAVISHSHABAN
 
PDF
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
LABUAN 4D
 
PPTX
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
B3AITS - Bow & 3 Arrows IT Solutions
 
PDF
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
PPTX
Google SGE SEO: 5 Critical Changes That Could Wreck Your Rankings in 2025
Reversed Out Creative
 
PDF
“Google Algorithm Updates in 2025 Guide”
soohhhnah
 
PDF
PDF document: World Game (s) Great Redesign.pdf
Steven McGee
 
PDF
LOGENVIDAD DANNYFGRETRRTTRRRTRRRRRRRRR.pdf
juan456ytpro
 
PDF
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
LABUAN 4D
 
PPTX
EthicalHack{aksdladlsfsamnookfmnakoasjd}.pptx
dagarabull
 
PPTX
CSharp_Syntax_Basics.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxx
nhdqw45qfd
 
PPTX
Different Generation Of Computers .pptx
divcoder9507
 
PPTX
ppt lighfrsefsefesfesfsefsefsefsefserrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrt.pptx
atharvawafgaonkar
 
ENCOR_Chapter_11 - ‌BGP implementation.pptx
nshg93
 
Transformaciones de las funciones elementales.ppt
rirosel211
 
Latest Scam Shocking the USA in 2025.pdf
onlinescamreport4
 
dns domain name system history work.pptx
MUHAMMADKAVISHSHABAN
 
办理方法西班牙假毕业证蒙德拉贡大学成绩单MULetter文凭样本
xxxihn4u
 
DNSSEC Made Easy, presented at PHNOG 2025
APNIC
 
原版北不列颠哥伦比亚大学毕业证文凭UNBC成绩单2025年新版在线制作学位证书
e7nw4o4
 
Introduction to dns domain name syst.ppt
MUHAMMADKAVISHSHABAN
 
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
LABUAN 4D
 
SEO Trends in 2025 | B3AITS - Bow & 3 Arrows IT Solutions
B3AITS - Bow & 3 Arrows IT Solutions
 
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
Google SGE SEO: 5 Critical Changes That Could Wreck Your Rankings in 2025
Reversed Out Creative
 
“Google Algorithm Updates in 2025 Guide”
soohhhnah
 
PDF document: World Game (s) Great Redesign.pdf
Steven McGee
 
LOGENVIDAD DANNYFGRETRRTTRRRTRRRRRRRRR.pdf
juan456ytpro
 
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
LABUAN 4D
 
EthicalHack{aksdladlsfsamnookfmnakoasjd}.pptx
dagarabull
 
CSharp_Syntax_Basics.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxx
nhdqw45qfd
 
Different Generation Of Computers .pptx
divcoder9507
 
ppt lighfrsefsefesfesfsefsefsefsefserrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrt.pptx
atharvawafgaonkar
 

AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization for patents, scientific literature, and web Harald Jenny (CENTREDOC, CH)

  • 1. Possibilities and limitations of AI-boosted multi- categorization for patents, scientific literature, and web
  • 2. AI Methodology Optimisation Automation Analysis & Synthesis 2015 2016 2017 2018 2019 7 Years of Business Intelligence Developpment 2020 2021
  • 3. Climbing on the Matterhorn The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings but also reveals the need to understand and accept the limitations of the technology A workshop report
  • 5. Prepare the case studies by exposing the possibilities and limits of the AI- assisted automatic categorization process. Discuss the challenges faced in setting up this process: • Definition of the trainingset (type of data to be processed, Patent or NPL or both) • Development of classifiers (single vs multi, selected fields, margin of error to be defined) • Volume handled: > 300,000 Process Advantage: • Collaboration with experts in the field • Multi categorization • Ability to select the fields to analyze • Combine AI classification tool with collaborative monitoring tool – take the best of two worlds Restitution of results in various forms with possible developments on demand
  • 6. Monitor oDifferent types of data to process (patent, NPL, web, internal documents) oIncreasing volume of information to monitor oMultiple data sources to consult oLimited time and resources How to o Process this ever increasing flow of data without devoting too much time and resources ? o Boost customer efficiency and bring customer expertise where it is most valuable? Automate o Automate the monitoring process from end to end o Optimize the data classification process by integrating AI
  • 7. Automate o Provide a data selection and classification accuracy close to an expert work with higher stability than humans o Save time and resources o Process quickly and efficiently large volumes of data on a regular basis
  • 8. Import Result AI classification Input: Patent, NPL, Web, internal documents Output: RAPID, export, synchronisation Free yourself from doing repetitive tasks Focus on what’s most matter: the result SmartCat
  • 9. SmartCat Powered by • Averbis Integrated in • RAPID Designed to • Process all types of data • Handle large volumes of data Empower you to • Detect relevant documents • Apply single or multi-label classifications
  • 10. 5.Run the classification process 6.Validate the AI classification 3.Run the learning process 4.Validate the prediction model 1.Provide a training set 2.Set the AI classifier
  • 11. Key during the definition and validation steps Expert contribution Classification • Balanced set • Unambiguous classification • Distinctive categories Trainingset • Field selection • Classification mode: Single VS Multi Classifier • Metrics validation Prediction model • Classification assessment • Relevance labels assigned o Precision o Recall o F1 score
  • 12. Precision Recall F1-Score 1 1 1.00 0.5 0.5 0.50 0.9 0.5 0.64 0.9 0.9 0.90 0.8 0.8 0.80 0.7 0.9 0.79 0.1 0.9 0.18 0.2 0.9 0.33 0.3 0.8 0.44 0.4 0.8 0.53 0.5 0.8 0.62 0.6 0.9 0.72 0.7 0.9 0.79 0.8 0.9 0.85 0.9 0.9 0.90 1 1 1.00 1 1 1.00 1 1 1.00 1 1 1.00 1 1 1.00 0 0,2 0,4 0,6 0,8 1 1,2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Precision Recall F1-Score Precision of a classifier: Ratio of good documents in a category Recall of a classifier: Ratio of relevant documents in a category F1-Score of a classifier: Combination of Precision and Recall
  • 13. Depends on o Thematic o Data quality o Classification uncertainties and complexity Contributes to o Subject matter expert(s) o Unambiguous and distinctive classification o Delimited search scope
  • 14. What we intended to do (and some times managed to do) Raw data One classifier Final result
  • 15. What we finally did Raw data Binary classifier Classifier #1 Classifier #2 Classifier #3 … Bad Result #1 Result #2 Result #3 … Good Final result
  • 16. Relevance rate estimated for each of the 3 monitoring processes implemented Number of iterations done before reaching a suitable relevance rate Time to multi-classify 1000 documents >80% ~3 4 min
  • 17. Fully automated process hosted in one place Experts focus on the result Patent, NPL, Web, internal documents Import Classification Restitution SmartCat We did it !
  • 18. Automated data upload Classification result SmartCat AI classification Expert reviews Weekly updates Expert evaluation User communication AI training based on expert feedback Case Study No 1: «enough time, no focus»
  • 19. Major hurdles Overcome by Implement a flexible and easy-to-use process Developping RAPID in collaboration with reknown experts in the field Ambiguities or uncertainties when defining the classification and the trainingset Providing reliable definition and selection Assess the classification quality Involving motivated experts Shift noted from the initial request Redefining the classification in agreement with the experts involved Synchronise data between RAPID and PS Setting an automated workflow compatible with RAPID and PS Reliability control Real time monitoring every step of the automated process Case Study No 1: «enough time, no focus»
  • 20. Set-up oChose a sufficiently large monitoring strategy for the alert (Criteria: find all the existing documents under observation or with oppositions) oTrain a classifier with all observation and opposition cases and the same quantity of clearly non-relevant documents oTake two month of monitoring data → 4’600 newly published documents oConfigure SmartCat: 5 certainly relevant documents, 6 probably relevant documents and 62 potentially relevant documents oCheck these 11 documents with Central IP → Yes, they are relevant. Case Study No 2: «no time, no monitoring»
  • 21. Set-up 0 500 1000 1500 2000 2500 3000 Non relevant – very sure Non relevant – sure Non relevant – not sure Relevant – not sure Relevant – sure Relevant – very sure 5 6 62 601 909 2823 Effect of additional training cycles Case Study No 2: «no time, no monitoring»
  • 22. Climbing on the Matterhorn 1. Establish a good training set 2. Configure the classifier system carefully 3. Don’t despair when your first attempt(s) fail(s) 4. Take a good guide 5. Study the AI-System carefully, identify the gradients of convergence 6. Repeat steps 1-5 in cycles until you… 7. Reach the summit 8. Enjoy the view ! 9. Be aware that every mountain is different
  • 23. From the data lake To the key document The Project Team Jean-Baptiste Porier Senior Data Analyst David Borel Head of Foresight Team Harald Jenny CEO
  • 24. The time for AI implementation is now. JACQUET DROZ 1 2002 NEUCHÂTEL WWW.CENTREDOC.SWISS [email protected] +41 32 720 51 31