SlideShare a Scribd company logo
Creating Structure in
                 Unstructured Data
                What is possible, today…?



Marco Gralike
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
“Big Data” = XML ?
Challenges are!
Ahum, the problems are!
WikiPedia
• One string of XML data with
  structured and unstructured
  data sections
• Language: English
• Size      : 42,15 GB
• Pages     : 12.961.997
• Date      : 21 Dec 2012
Adventures into
the unknown…?
Setup
• VirtualBox VM
  – OEL 5U8 (64)
  – 8 GB RAM
• LaCie Little Big Disk
  – RAID 0
  – Thunderbolt
• Database
  – SGA    4GB
  – PGA    2GB
My new LaCie LBD is really fast - 
Defeat?! - 1.000.000 pages only
Status of Technology used
XML - Where are we…?




Gartner
Achieved…?
On the Horizon!
• JSoniq
• Zorba
Building (streaming) Bridges
Oracle XML DB
      • NO cost option
      • C (native / embedded kernel)
      • (XQuery) Standards
      • Code maintained by Oracle
XQuery

                                           XMLType Abstraction
                               DB XQuery                                                 Procedural XQuery

                     XQuery Rewrite                         Pushdown                XVM
                                                                           (use “no query rewrite”)


                                  Relational        Streaming XPath                             DOM Tree
                                                       Evaluation                                Model
                                   Access
       SQL Execution              Methods                                   XMLIndex




            Object-Relational                                             Binary XML


           Relational Storage                                            Secure Files

Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text
So about what are we talking ?
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
WikiPedia
• Structured & Unstructured
  bits and pieces
• A lot of “unbounded”
  elements
• Not a lot of restrictions
• The bit with value is in
  element “tekst”
How do we get this Structured?
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Strings = small & defined (12c?)

   Ename  pointer += 100;
<string1/><string2/><string3/>
Flexible, Humans
No Design Patterns
<small/><verybigggr/><bigger/>
<verybigggr>
       <empno>1</empno><ename>Marco</ename>
       <empno>2</empno>
</verybigggr>




 <small/><verybigggr/><bigger/>
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
We need options!
“XMLType” Container

  In Memory            CLOB
  (document)        (document)

Object Relational   Binary XML
     (data)            (data)
XMLType
      In Memory
      (document)


XOB          XML Schema
XMLType
   Binary XML Securefile
    (document/content)


Post Parse        LOB Index
XMLType
        Object Relational
           (content)


Fully Shredded        Indexes
Something else to Realize !
“What is the fastest way to get this
    stuff in the database…?”
“…it depends…”
“So what is the fastest way to get
    XML in the database…?”
“…it depends…”
“So what is the fastest way to get XML
           in the database…
    … and   useful in my case…?”
Garbage IN – Garbage OUT
WikiPedia
•   SQL*Loader
•   Parallel or Direct
•   Securefile LOB Column
•   2.5 hours

And no (performant) way
to get the details out…
a.k.a “completely useless”
WikiPedia
•   SQL*Loader
•   Parallel or Direct
•   Securefile Binary XML
•   …2.5 hours ???
XML Parsing




• SAX   - Simple API for XML
• DOM   - Document Object Module
fast

insert performance   CLOB



                               XMLType
                                CLOB

                       (domain) indexes

                                           XMLType
                                          Binary XML



                                                         XMLType
                                                       Object Relational




                                                                           fast
                             select performance
Hotsos 2013 - Creating Structure in Unstructured Data
XML Partitioning
• Object Relational Partitioning
  – Equi-Partitioning since version Oracle 11.1.0.7.0
• Binary XML Partitioning
  – Range, List, Hash
• Local partitioned XMLIndex
  – LOCAL keyword in XMLIndex create syntax
• Partition Key on virtual column (Binary XML)
• Partition Key on column (Object Relational)
XMLType
   Binary XML Securefile
    (document/content)


Post Parse        LOB Index
Driving access on CONTENT
                                                   BTre
                                                    e
                                                  Index
                           bookstore
                                                                          Function
                                                                         based Index
                                                                           (XPath)
        book                                    whitepaper

title   author   author chapter         title     author          id     paragraph
            Unstructured
                                                          Structured XMLIndex
             XMLIndex
                            content                                       structured
                                                                           content
                                                          BTree
                           Oracle XML                     Index
                           Text Index
Structured Data
Structured XMLIndex (SXI)
• CONTENT TABLE(s)
• Based on XMLTABLE syntax        Structured
                                  XMLIndex
• XMLTable construct can be          f (x)

  nested:
  – VIRTUAL column alias
• Can be maintained manually
• Secondary indexes possible
                                   Content
                                   Tables
Describe CONTENT TABLE




• A “regular” heap table with columns…
• Ideal for secondary indexes, if needed.
CONTENT TABLE(s)

 Structured
 XMLIndex
    f (x)




  Content
  Tables
Semi-Structured Data
Unstructured XMLIndex (UXI)
• PATH TABLE
• Use Path Subsetting                 Unstructured
   – Full Blown XMLIndex can be BIG    XMLIndex
                                          f (x)
• Token Tables (XDB.X$......)
   – Query re-write on Tokens
   – Fuzzy Searches, //
   – Optimizer Statistics
• Can be maintained manually
   – Recorded in Pending Table
                                        Path Table
• Secondary indexes possible
Describe PATH TABLE
What’s hidden…
PATH TABLE

Unstructured
 XMLIndex
    f (x)




 Path Table
Binary XML – No Index
Binary XML + XMLIndex (SXI)
Binary XML + XMLIndex + Sec.Ind.
Binary XML + XMLIndex + Sec.Ind.
Un-Structured Data
XML Full Tekst Index
• Based on Oracle Text Index, XQuery Full Text
• XML Namespace Aware
• XML Semantic aware full text search
  – Full-Tekst Selection Expression – contains text
  – Logical Full Text Operator – ftor, ftand, ftMildNot
  – Context Aware full text search
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Hotsos 2013 - Creating Structure in Unstructured Data
Balanced Design
• Inserts, Updates & Deletes
  – XML Future Changes
  – Index Maintenance           In Memory   On Disk

• Selects
  – In Memory
  – Via Indexes
• XML Validation
  – Strict, Lazy
  – Client Side Possibilities
Reward
• Optimal performance
• Out performing XML
• Proper design will give
  performance increase over
  XML handling…


…proper design is still key…
Hotsos 2013 - Creating Structure in Unstructured Data
References
Oracle XML DB
  – http://www.oracle.com/pls/db112/homepage
XML DB FAQ Thread
  – http://forums.oracle.com/forums/thread.jspa?thr
    eadID=410714
Personal Blog
  – http://www.xmldb.nl
  – http://technology.amis.nl
References
Daniela Florescu, Oracle Corporation
  Advances in XML and XQuery
Sam Idicula, Oracle XML DB Development Team
  Binary XML Storage and Query Processing in Oracle
Jinyu Wang, Scott Brewton
  Making XML Technology Easier to Use
Joel Spolsky - Joel on Software
  Back to Basics
References
Oracle XML DB Main page material
• Oracle XML DB : Best Practices to Get Optimal
  Performance out of XML Queries (PDF)
• Oracle XML DB : Choosing the Best XMLType
  Storage Option for Your Use Case (PDF)
• A Request for Comments for the Oracle Binary
  XML Format

More Related Content

What's hot (20)

PPTX
XFILES, The APEX 4 version - The truth is in there
Marco Gralike
 
PPTX
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Marco Gralike
 
PPTX
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
Marco Gralike
 
PPTX
Design Concepts For Xml Applications That Will Perform
Marco Gralike
 
PPTX
Oracle Database 11g Release 2 - XMLDB New Features
Marco Gralike
 
PPTX
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
Marco Gralike
 
PPT
XML In The Real World - Use Cases For Oracle XMLDB
Marco Gralike
 
PDF
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
Marco Gralike
 
PPT
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Marco Gralike
 
PPTX
Ordina Oracle Open World
Marco Gralike
 
PPTX
Starting with JSON Path Expressions in Oracle 12.1.0.2
Marco Gralike
 
PPT
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
Marco Gralike
 
PDF
UKOUG Tech14 - Getting Started With JSON in the Database
Marco Gralike
 
PPT
Jdbc 4.0 New Features And Enhancements
scacharya
 
PPT
Xml parsers
Manav Prasad
 
PPTX
Xml processors
Saurav Mawandia
 
PPTX
Database Programming
Henry Osborne
 
PPT
Java XML Parsing
srinivasanjayakumar
 
PDF
Cloudera Impala, updated for v1.0
Scott Leberknight
 
PPTX
Spring data jpa
Jeevesh Pandey
 
XFILES, The APEX 4 version - The truth is in there
Marco Gralike
 
Oracle Developer Day, 20 October 2009, Oracle De Meern, Holland: Oracle Datab...
Marco Gralike
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 1
Marco Gralike
 
Design Concepts For Xml Applications That Will Perform
Marco Gralike
 
Oracle Database 11g Release 2 - XMLDB New Features
Marco Gralike
 
OakTable World 2015 - Using XMLType content with the Oracle In-Memory Column...
Marco Gralike
 
XML In The Real World - Use Cases For Oracle XMLDB
Marco Gralike
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
Marco Gralike
 
Real World Experience With Oracle Xml Database 11g An Oracle Ace’s Perspectiv...
Marco Gralike
 
Ordina Oracle Open World
Marco Gralike
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Marco Gralike
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
Marco Gralike
 
UKOUG Tech14 - Getting Started With JSON in the Database
Marco Gralike
 
Jdbc 4.0 New Features And Enhancements
scacharya
 
Xml parsers
Manav Prasad
 
Xml processors
Saurav Mawandia
 
Database Programming
Henry Osborne
 
Java XML Parsing
srinivasanjayakumar
 
Cloudera Impala, updated for v1.0
Scott Leberknight
 
Spring data jpa
Jeevesh Pandey
 

Viewers also liked (9)

PDF
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Peter Wren-Hilton
 
PPTX
Unstructured data processing webinar 06272016
George Roth
 
PDF
Dealing with Unstructured Data: Scaling to Infinity
Great Wide Open
 
PPT
Lecture 11 Unstructured Data and the Data Warehouse
phanleson
 
PPTX
The Analytic System: Finding Patterns in the Data
Health Catalyst
 
PPSX
Unstructured Data in BI
Monaheng Diaho
 
PDF
Analyzing Unstructured Data in Hadoop Webinar
Datameer
 
PPT
Analysis of ‘Unstructured’ Data
Seth Grimes
 
PPTX
Using Hadoop as a platform for Master Data Management
DataWorks Summit
 
Mining Unstructured Data:Practical Applications, from the Strata O'Reilly Mak...
Peter Wren-Hilton
 
Unstructured data processing webinar 06272016
George Roth
 
Dealing with Unstructured Data: Scaling to Infinity
Great Wide Open
 
Lecture 11 Unstructured Data and the Data Warehouse
phanleson
 
The Analytic System: Finding Patterns in the Data
Health Catalyst
 
Unstructured Data in BI
Monaheng Diaho
 
Analyzing Unstructured Data in Hadoop Webinar
Datameer
 
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Using Hadoop as a platform for Master Data Management
DataWorks Summit
 
Ad

Similar to Hotsos 2013 - Creating Structure in Unstructured Data (20)

PPTX
SQLPASS AD501-M XQuery MRys
Michael Rys
 
PPTX
Expertezed 2012 Webcast - XML DB Use Cases
Marco Gralike
 
PDF
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
InSync2011
 
PPT
Sedna XML Database System: Internal Representation
Ivan Shcheklein
 
PPT
DB2 Native XML
Amol Pujari
 
PPTX
advDBMS_XML.pptx
IreneGetzi
 
PPT
XML stands for EXtensible Markup Language
NetajiGandi1
 
PPT
XMLLec1 (1xML lecturefsfsdfsdfdsfdsfsdfsdfdsf
Kamrankhan925215
 
PPT
XMLLec1.pptsfsfsafasfasdfasfdsadfdsfdf dfdsfds
Kamrankhan925215
 
PPT
DATA INTEGRATION (Gaining Access to Diverse Data).ppt
careerPointBasti
 
PDF
Bitmap Indexes for Relational XML Twig Query Processing
Kyong-Ha Lee
 
PPT
unit_5_XML data integration database management
sathiyabcsbs
 
PDF
PostgreSQL and XML
Peter Eisentraut
 
PPT
Xml nisha dwivedi
NIIT
 
PPT
ravenbenweb xml and its application .PPT
ubaidullah75790
 
PDF
Oracle XML DB - What's in it for me?
Sage Computing Services
 
PDF
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
Lucidworks (Archived)
 
PPTX
Development of a new indexing technique for XML document retrieval
Amjad Ali
 
PPT
Xml 215-presentation
Simsima Tchakma
 
PDF
Integrating Lucene into a Transactional XML Database
lucenerevolution
 
SQLPASS AD501-M XQuery MRys
Michael Rys
 
Expertezed 2012 Webcast - XML DB Use Cases
Marco Gralike
 
Developer & Fusion Middleware 1 | Mark Drake | An introduction to Oracle XML ...
InSync2011
 
Sedna XML Database System: Internal Representation
Ivan Shcheklein
 
DB2 Native XML
Amol Pujari
 
advDBMS_XML.pptx
IreneGetzi
 
XML stands for EXtensible Markup Language
NetajiGandi1
 
XMLLec1 (1xML lecturefsfsdfsdfdsfdsfsdfsdfdsf
Kamrankhan925215
 
XMLLec1.pptsfsfsafasfasdfasfdsadfdsfdf dfdsfds
Kamrankhan925215
 
DATA INTEGRATION (Gaining Access to Diverse Data).ppt
careerPointBasti
 
Bitmap Indexes for Relational XML Twig Query Processing
Kyong-Ha Lee
 
unit_5_XML data integration database management
sathiyabcsbs
 
PostgreSQL and XML
Peter Eisentraut
 
Xml nisha dwivedi
NIIT
 
ravenbenweb xml and its application .PPT
ubaidullah75790
 
Oracle XML DB - What's in it for me?
Sage Computing Services
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
Lucidworks (Archived)
 
Development of a new indexing technique for XML document retrieval
Amjad Ali
 
Xml 215-presentation
Simsima Tchakma
 
Integrating Lucene into a Transactional XML Database
lucenerevolution
 
Ad

More from Marco Gralike (11)

PPTX
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
Marco Gralike
 
PPTX
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
Marco Gralike
 
PPTX
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
Marco Gralike
 
PPTX
Oracle Database - JSON and the In-Memory Database
Marco Gralike
 
PPTX
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
Marco Gralike
 
PDF
An introduction into Oracle VM V3.x
Marco Gralike
 
PDF
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
Marco Gralike
 
PPTX
An AMIS Overview of Oracle database 12c (12.1)
Marco Gralike
 
PPTX
Flexibiliteit & Snel Schakelen
Marco Gralike
 
PPTX
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
Marco Gralike
 
PPT
Amis ACE
Marco Gralike
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
Marco Gralike
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
Marco Gralike
 
eProseed Oracle Open World 2016 debrief - Oracle 12.2.0.1 Database
Marco Gralike
 
Oracle Database - JSON and the In-Memory Database
Marco Gralike
 
UKOUG Tech15 - Going Full Circle - Building a native JSON Database API
Marco Gralike
 
An introduction into Oracle VM V3.x
Marco Gralike
 
An introduction into Oracle Enterprise Manager Cloud Control 12c Release 3
Marco Gralike
 
An AMIS Overview of Oracle database 12c (12.1)
Marco Gralike
 
Flexibiliteit & Snel Schakelen
Marco Gralike
 
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
Marco Gralike
 
Amis ACE
Marco Gralike
 

Recently uploaded (20)

PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
July Patch Tuesday
Ivanti
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 

Hotsos 2013 - Creating Structure in Unstructured Data

  • 1. Creating Structure in Unstructured Data What is possible, today…? Marco Gralike
  • 7. WikiPedia • One string of XML data with structured and unstructured data sections • Language: English • Size : 42,15 GB • Pages : 12.961.997 • Date : 21 Dec 2012
  • 9. Setup • VirtualBox VM – OEL 5U8 (64) – 8 GB RAM • LaCie Little Big Disk – RAID 0 – Thunderbolt • Database – SGA 4GB – PGA 2GB
  • 10. My new LaCie LBD is really fast - 
  • 11. Defeat?! - 1.000.000 pages only
  • 13. XML - Where are we…? Gartner
  • 15. On the Horizon! • JSoniq • Zorba
  • 17. Oracle XML DB • NO cost option • C (native / embedded kernel) • (XQuery) Standards • Code maintained by Oracle
  • 18. XQuery XMLType Abstraction DB XQuery Procedural XQuery XQuery Rewrite Pushdown XVM (use “no query rewrite”) Relational Streaming XPath DOM Tree Evaluation Model Access SQL Execution Methods XMLIndex Object-Relational Binary XML Relational Storage Secure Files Source: S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text
  • 19. So about what are we talking ?
  • 27. WikiPedia • Structured & Unstructured bits and pieces • A lot of “unbounded” elements • Not a lot of restrictions • The bit with value is in element “tekst”
  • 28. How do we get this Structured?
  • 31. Strings = small & defined (12c?) Ename  pointer += 100;
  • 35. <verybigggr> <empno>1</empno><ename>Marco</ename> <empno>2</empno> </verybigggr> <small/><verybigggr/><bigger/>
  • 41. “XMLType” Container In Memory CLOB (document) (document) Object Relational Binary XML (data) (data)
  • 42. XMLType In Memory (document) XOB XML Schema
  • 43. XMLType Binary XML Securefile (document/content) Post Parse LOB Index
  • 44. XMLType Object Relational (content) Fully Shredded Indexes
  • 45. Something else to Realize !
  • 46. “What is the fastest way to get this stuff in the database…?”
  • 48. “So what is the fastest way to get XML in the database…?”
  • 50. “So what is the fastest way to get XML in the database… … and useful in my case…?”
  • 51. Garbage IN – Garbage OUT
  • 52. WikiPedia • SQL*Loader • Parallel or Direct • Securefile LOB Column • 2.5 hours And no (performant) way to get the details out… a.k.a “completely useless”
  • 53. WikiPedia • SQL*Loader • Parallel or Direct • Securefile Binary XML • …2.5 hours ???
  • 54. XML Parsing • SAX - Simple API for XML • DOM - Document Object Module
  • 55. fast insert performance CLOB XMLType CLOB (domain) indexes XMLType Binary XML XMLType Object Relational fast select performance
  • 57. XML Partitioning • Object Relational Partitioning – Equi-Partitioning since version Oracle 11.1.0.7.0 • Binary XML Partitioning – Range, List, Hash • Local partitioned XMLIndex – LOCAL keyword in XMLIndex create syntax • Partition Key on virtual column (Binary XML) • Partition Key on column (Object Relational)
  • 58. XMLType Binary XML Securefile (document/content) Post Parse LOB Index
  • 59. Driving access on CONTENT BTre e Index bookstore Function based Index (XPath) book whitepaper title author author chapter title author id paragraph Unstructured Structured XMLIndex XMLIndex content structured content BTree Oracle XML Index Text Index
  • 61. Structured XMLIndex (SXI) • CONTENT TABLE(s) • Based on XMLTABLE syntax Structured XMLIndex • XMLTable construct can be f (x) nested: – VIRTUAL column alias • Can be maintained manually • Secondary indexes possible Content Tables
  • 62. Describe CONTENT TABLE • A “regular” heap table with columns… • Ideal for secondary indexes, if needed.
  • 63. CONTENT TABLE(s) Structured XMLIndex f (x) Content Tables
  • 65. Unstructured XMLIndex (UXI) • PATH TABLE • Use Path Subsetting Unstructured – Full Blown XMLIndex can be BIG XMLIndex f (x) • Token Tables (XDB.X$......) – Query re-write on Tokens – Fuzzy Searches, // – Optimizer Statistics • Can be maintained manually – Recorded in Pending Table Path Table • Secondary indexes possible
  • 69. Binary XML – No Index
  • 70. Binary XML + XMLIndex (SXI)
  • 71. Binary XML + XMLIndex + Sec.Ind.
  • 72. Binary XML + XMLIndex + Sec.Ind.
  • 74. XML Full Tekst Index • Based on Oracle Text Index, XQuery Full Text • XML Namespace Aware • XML Semantic aware full text search – Full-Tekst Selection Expression – contains text – Logical Full Text Operator – ftor, ftand, ftMildNot – Context Aware full text search
  • 85. Balanced Design • Inserts, Updates & Deletes – XML Future Changes – Index Maintenance In Memory On Disk • Selects – In Memory – Via Indexes • XML Validation – Strict, Lazy – Client Side Possibilities
  • 86. Reward • Optimal performance • Out performing XML • Proper design will give performance increase over XML handling… …proper design is still key…
  • 88. References Oracle XML DB – http://www.oracle.com/pls/db112/homepage XML DB FAQ Thread – http://forums.oracle.com/forums/thread.jspa?thr eadID=410714 Personal Blog – http://www.xmldb.nl – http://technology.amis.nl
  • 89. References Daniela Florescu, Oracle Corporation Advances in XML and XQuery Sam Idicula, Oracle XML DB Development Team Binary XML Storage and Query Processing in Oracle Jinyu Wang, Scott Brewton Making XML Technology Easier to Use Joel Spolsky - Joel on Software Back to Basics
  • 90. References Oracle XML DB Main page material • Oracle XML DB : Best Practices to Get Optimal Performance out of XML Queries (PDF) • Oracle XML DB : Choosing the Best XMLType Storage Option for Your Use Case (PDF) • A Request for Comments for the Oracle Binary XML Format

Editor's Notes

  • #19: See also OOW 2010, S317428: Building Really Scalable XML Applications with Oracle XML DB and Oracle Text – Nipun Agarwal, Oracle