0% found this document useful (0 votes)
37 views

NDR 2007 A New Approach To Information Discovery: Perry Stokes (Mike Odell) July 20, 2006

This document discusses a new approach to information discovery called geographic text search. It is presented by Perry Stokes of MetaCarta, a company that uses natural language processing and proprietary geographic data modules to identify geographic references in unstructured documents and assign them geographic locations and confidence values. This allows documents to be searched and results to be viewed in a geospatial context. The challenges of traditional document search and text search are outlined, and it is explained that geographic text search provides faster, more comprehensive searches across multiple data sources and brings results directly into a decision-making GIS framework.

Uploaded by

Devananda Narah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

NDR 2007 A New Approach To Information Discovery: Perry Stokes (Mike Odell) July 20, 2006

This document discusses a new approach to information discovery called geographic text search. It is presented by Perry Stokes of MetaCarta, a company that uses natural language processing and proprietary geographic data modules to identify geographic references in unstructured documents and assign them geographic locations and confidence values. This allows documents to be searched and results to be viewed in a geospatial context. The challenges of traditional document search and text search are outlined, and it is explained that geographic text search provides faster, more comprehensive searches across multiple data sources and brings results directly into a decision-making GIS framework.

Uploaded by

Devananda Narah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

NDR 2007

A New Approach to
Information Discovery
Perry Stokes
(Mike Odell)

July 20, 2006

1
Agenda

Who is MetaCarta?
What is Natural Language
Processing
Information Discovery Challenge
Approaches to Information Discovery
Geographic Text Search
What is it ?
How does it work ?
How is it used?
W h at is its valu e?
2
Who is MetaCarta?
T h e g e o g ra p h ic te xt se a rch co m p a n y fo u n d e d in 1 9 9 9
by MIT researchers
Offices are in Boston, MA, Washington D.C., and Houston, TX

Energy Customers

Public Sector Customers DARPA, CIA, U.S. Army, SOCOM, EPA


U.S. Navy, FBI, Homeland Security

Partners

Hunt Oil, Chevron, Sevin Rosen Funds,


Investors
InQTel, FA Technology Ventures, Chisholm,
Solstice Capitol
3
What is natural language
processing?
Natural language processing is used to tokenize the
parts of speech in electronic text documents collected
from shared drives, document management systems,
and the internet.
MetaCarta uses this innovative technology to identify
geographic references in the documents being
processed.
All geographic references are assigned a geographic
location and a confidence value using our proprietary
Geographic Data Module.
The result is a geographically constrained geographic
index.

4
Information Discovery Challenges

Internal Sources
EDMS
Shared Drives
Library Technical Published Technical Data in
Catalogs Databases Documents File Servers
Internal Network
(peers)

GIS Layers
Satellite Images Knowledge Corporate Archives email
Portals Websites

External Sources

News Events
Science and Industry Content Providers

E&P Industry Content Providers

Web content Competitive Intelligence


5
Information Discovery Challenges

US National Academy of IDC


Science 35%-50% information is NOT
found by typical search
80% of business is conducted on
engines
unstructured data
30% of time spent:
85% of all data stored is
unstructured Searching for non-
existent documents
60% annual growth rate for
Failing to find existing
unstructured data
information
70 - 80% of unstructured data R ecreatin g w h at can t b e
has some geographic reference found
validated by actual Fortune 500 customer
70% of us are visual learners
but text based displays
dominate

6
Approaches to Information
Discovery: Document Search

Business Unit Shared Folder

Folder Count

36

Deeper folder structures


216

Manual Navigation 1296


Optimized for one taxonomy
Thousands of files7
Approaches to Information
Discovery: Document Search

3 h o u rs la te r
still searching

8
Approaches to Information
Discovery: Text search

T h o u sa n d s o f b e st h its is still to o m a n y to re a d
Filtering on words alone is insufficient
Time consuming to review

Fast <1 second


Can search across
multiple collections
Does not bring results
into decision space

9
Approaches to Information
Discovery: Web Search

try askin g fo r a p lace

T h e rig h t text
search an sw er isn t
always the right
human answer.

10
Find IT Geographical Text
Search

Fast 1 second
Can search across
multiple collections
Brings results into GIS
decision space

11
Approaches to Information
Discovery : GIS

Must know
what you are looking for
that the information is
stored in a database
and the attributes that you
Lease MC 441 are using are included

Lease
Database

Lease Document

12
How Does it Work ?

Bulletin Vol. 88 (2004), No. 13 (Supplement)


AAPG Annual Meeting
Dallas, Texas
April 18-21, 2004
Base Schumacher, Dietmar1, Daniel Hitzman1 2
GDM (1) Geo-Microbial Technologies, Inc, Ochelata, OK
ABSTRACT: Geochemical Exploration in North Africa: Recent
Successes from Algeria, Tunisia, and Egypt
Detailed geochemical and research studies document that
hydrocarbon microseepage from petroleum accumulations is common,
is predominantly vertical (with obvious exceptions in some geologic
settings), and is dynamic (responds quickly to changes in reservoir
conditions).
Energy 1

GDM Crazy Horses and Mad Dogs


New Deep Fields Show Gulf Trends

The deep water play in the Gulf of Mexico -- the hot


exploration province for the last decade -- just keeps getting
deeper and hotter.
Recently BP-Amoco announced four major new discoveries,
two of which are in over 6,000 feet of water. The most
Custom important of these new finds is Crazy Horse on Mississippi
5 4
Gazetteer Canyon block 778 and surrounding blocks in the Boarshead 3
Basin 125 miles southeast of New Orleans. The new field has
estimated resources of at least one billion barrels of oil
equivalent -- the largest discovery ever in the Gulf deep
water, according to the company.

13
Geographic Text Search

Create a searchable index of


geographic locations and key words
contained within unstructured
documents, located in disparate
data sources

Puts documents
on the map

14
Geospatial Knowledge Discovery

Shared
Drives
Shared
Drives
Shared
Drives
Shared
Drives
Single search across many
sources
Knowledge
BU - Specific Data
Archives Corporate Websites
Internal Network
(Peers)
Portals
Corporate Knowledge Base
Published
Library Catalogs
Internal Spatial Data, Map
Documents
services
External Subscription Data
GIS Layers External Free Web Data

Science and Industry


Secure access
Geotagged
E&P Industry Content Providers Content Providers
News

Competitive Intelligence

15
Data Fusion

Commercial Data
Mapping & GIS Sources
& Visualization

Combine Data Search


and Visualization

Increase the value


of existing systems

Document
Management Systems

16
What information do we have ?

Research / Information discovery;


What data, analysis and reports do we have about
Libya?
1. Zoom to Libya using the Location Finder
2. C o n d u ct an E m p ty S earch
3. Browse the search results 17
What information do we have ?

18
What information do we have ?

19
Standard Text Search

20
The Impossible Search

21
Customer Uses

Find information from past projects


Competitive analysis
Locate information when assigned to a new area
Merge technical libraries
Identify / Index documents obtained through M & A
Research
Support lease sales
Spatially locate lease and contract documents
Regulatory compliance 22
Competitor Analysis

Filter
Geographically

Filter
by TIME

Competitor Analysis:
ChevronTexaco activities in/near Angola
within the last 5 years

23
User Comments
T h is is a se a rch to o l th a t talks th e lan g u ag e of the knowledge
w o rke r

L e ve ra g in g a p o w e rfu l se a rch to o l th a t is g e o g ra p h ica lly b a se d , brings


a new dimension of value to se a rch re su lts.

T h is is G o o g le fo r th e o il in d u stry

It se e m s to m e th a t not having it w ill b e a co m p e titive d isa d va n ta g e

W e're fin d in g n ew p lays w ith th is

E ve ry d a y, a ll d a y lo n g , I n e e d th is to o l

W h at th is g ives m e is a to o l th at sh o w s m e w h at I d o not kn o w .

24
Thank you for your time

Questions ?

25
Recap
IHS and MetaCarta
The largest collection of E&P spatial data and use of Natural
Language Processing to locate all your documents using a map
Jointly developed and owned Geographic Data Modules

Geographic Text Search


Complements existing investment in GIS and
Document Management Systems
Addresses need to simultaneously search and visualize multiple
document stores distributed globally
Provides results in the context of your daily business

26
ROI
Initial trial indicated at least 2% (1 hour/week) of workers time could be
saved using geographic text search
If implemented across the enterprise, this translates to:
(1 hour per week*50 weeks)*($150k FTE per annum /2000 hours)*10k
employees
5 0 h o u rs p e r a n n u m *$ 7 5 /h o u r*1 0 k e m p lo ye e s 1 =

$37.5mm/annum cost savings

1Assumption:

10% workforce take-up across enterprise of 100,000 global workforce

27
The Solution

The World With MetaCarta

E-mail
Messages/Cables
Correspondence
Reports Unstructured GIS Maps & Imagery
Analyses Content Complete Systems
Web Pages Overlap
News Feeds
Databases

Company Proprietary
28
The Problem

The World Without MetaCarta

E-mail
Messages/Cables
Correspondence
Reports Unstructured GIS Maps & Imagery
Analyses Content Trivial Systems
Web Pages Overlap
News Feeds Area of
Databases Pain

29
The MetaCarta and IHS
Relationship

7.6 million international E&P Natural Language Processing


placenames tuned to the specific
Wells requirements of E&P
Fields
Basins 10 million plus
Contracts placenames

IHS and MetaCarta are in discussions to jointly


d e ve lo p a n d o w n G e o g ra p h ic D a ta M o d u le s (G D M s)
for the energy market

30

You might also like