SlideShare a Scribd company logo
F I L E S
XML, XPath, XSLT...
the 'X' files?
ATA 56th Annual Conference, Miami
Paul Filkin pfilkin@sdl.com
November 4 -7, 2015
2
What is XML?
eXtensible Markup Language
+
3
When it’s simple…
4
… it’s easy!
SDL Trados Studio memoQ CafeTran
5
When it’s not so simple…
6
… it get’s a little trickier!
SDL Trados Studio
memoQ
CafeTran
7
Some tools can still handle these…
… in a friendly way
SDL Trados Studio Déjà Vu
8
But what happens if the file looks like this?
9
But what happens if the file looks like this?
10
But what happens if the file looks like this?
11
But what happens if the file looks like this?
12
Now we need to get specific…
• XPath is a syntax for defining parts
of an XML document
• XPath uses path expressions to
navigate in XML documents
• XPath contains a library of
standard functions
• XPath is a major element in XSLT
• XPath is a W3C recommendation
13
XPath’s purpose is to locate
any part of an XML document
14
X Path Terminology
(151030) The X files
16
Studio gave us a clue earlier…
//text/@line
17
Studio gave us a clue earlier…
//text/@line
18
//text/@line or /atts/title/verse1/text/@line
c:UserspfilkinDocumentsSDLPresentations20151104 (Miami - ATA)PresentationsThe X FilesFiles02 – not so simple.xml
19
//text/@line or /atts/title/verse1/text/@line
c:UserspfilkinDocumentsSDLPresentations20151104 (Miami - ATA)PresentationsThe X FilesFiles02 – not so simple.xml
Now carry on drilling down…
Helpful and free
XPath Tools
21
xmltree : http://xpathexplorer.sourceforge.net/
XPath expression
is returned here
Results of the XPath
are returned here
Select the XML Node
22
memoQ multilingual XML
Select the XML Node
XPath expression
is returned here
23
XMLQuire: http://qutoric.com/xmlquire/
Type your XPath expression
Highlights one result
at a time in the XML file
XPath info box
24
XPath Visualizer : http://xpathvisualizer.codeplex.com/
Type your XPath expression
Highlights all results
in the XML file
A few XPath Examples
26
A few basics… element nodes
○ Use //* to extract all elements
○ Use //simpleelement to extract any text in this element
– 1, 2, 3, 4, 5, 6, 7
○ Use //nestedelement/simpleelement to extract only text from
duplicated child elements with different parent elements
– 2
< title lang='en‘ >
element
attribute
27
A few basics… attribute nodes
○ Use //*/@* to extract all attributes
○ Use @translateatt to translate any translateatt attribute
– 3a, 5a
< title lang='en‘ >
element
attribute
28
A few basics… putting statements together
○ Use //* | //*/@* to extract all elements and all attributes
together in one statement
29
What else makes up an XPath?
My clever dog Regex barked at the mailman
30
What else makes up an XPath?
My clever dog Regex barked at the mailman
Complete Subject
31
What else makes up an XPath?
My clever dog Regex barked at the mailman
Complete Subject Predicate
Expresses what the subject does
32
What else makes up an XPath?
My clever dog Regex barked at the mailman
Complete Subject Predicate
Expresses what the subject does
barked
Always includes a verb, can also include other descriptive words
33
What else makes up an XPath?
My clever dog Regex barked at the mailman// [ ],ʹ ʹ
34
What else makes up an XPath?
My clever dog Regex barked at the mailman// [ ],ʹ ʹ
Node test
35
What else makes up an XPath?
My clever dog Regex barked at the mailman// [ ],ʹ ʹ
Node test Predicate
Narrows down the Node test
36
What else makes up an XPath?
My clever dog Regex barked at the mailman// [ ],ʹ ʹ
Node test Predicate
Narrows down the Node test
barked
Does not have to include a node, but always includes other descriptive tests
37
Predicates…
○ Use //*[@translate='yes'] to extract any text in any element
with this attribute value
– 3, 6, 8
○ Remember @translateatt
Now we’ll use //*[not(@translate="no")]/@translateatt to
conditionally translate this same attribute
– 5a
38
Functions and Operators…
○ Use //*[contains(text(), 'ATA56')] to extract the contents of
any segments containing the text ATA56
– 1, 9, 11
○ Use //condition[answer/text()='42']/extract to extract the
contents of the extract element if the value of the answer
element is 42
– 13
39
Namespaces…
○ Use //*[local-name()='strong'] instead of //strong to extract
text from all elements irrespective of the use of namespaces
– 14, 15, 16
40
Miscellaneous uses…
○ Use //xpath/text() to use content from the xml file as
descriptive text for the DSI column in SDL Trados Studio
41
Miscellaneous uses…
○ Use attribute values to control segment lengths in the
advanced options, so @max or @min for max=‘50’ or
min=‘5’ in SDL Trados Studio
42
Reference…
http://www.w3.org/TR/xpath/
43
44
What is XSLT?
eXtensible Stylesheet Language Transformations
 
45
XSLT uses XPath!
46
Looks familiar? Creating your template
47
Looks familiar? Creating your template
Selecting XML comments
48
Looks familiar? Creating your template
Selecting XML comments
Selecting XML attribute values
49
An improved experience…memoQ
50
An improved experience…memoQ
51
An improved experience…SDL Trados Studio
52
An improved experience…SDL Trados Studio
(151030) The X files

More Related Content

PPT
Xpath.ppt
Prerak10
 
PDF
02_Xpath.pdf
Prerak10
 
PDF
A brief overview of XPath - Topic in XML - Web Technologies
SadhuRamakrishnanBal
 
PDF
Xpath.pdf
BalasundaramSr
 
PDF
Querring xml with xpath
Malintha Adikari
 
PDF
Xpath tutorial
Ashoka Vanjare
 
PPTX
Extracting data from xml
Kumar
 
PPTX
Structured Strategy: How to Supercharge Your Content Analysis with XML and XPath
Josh Anderson
 
Xpath.ppt
Prerak10
 
02_Xpath.pdf
Prerak10
 
A brief overview of XPath - Topic in XML - Web Technologies
SadhuRamakrishnanBal
 
Xpath.pdf
BalasundaramSr
 
Querring xml with xpath
Malintha Adikari
 
Xpath tutorial
Ashoka Vanjare
 
Extracting data from xml
Kumar
 
Structured Strategy: How to Supercharge Your Content Analysis with XML and XPath
Josh Anderson
 

Similar to (151030) The X files (20)

PPTX
XPath Introduction
Stuart Myles
 
PPTX
XPATH_XSLT-1.pptx
BalasundaramSr
 
PPT
03 x files
Baskarkncet
 
PPTX
Xml presentation
Miguel Angel Teheran Garcia
 
PPTX
X path
Sagar Guhe
 
PPTX
X path
Sagar Guhe
 
PPT
Xml (2)
sudhakar mandal
 
PPT
Xpath presentation
Alfonso Gabriel López Ceballos
 
PDF
Xpath1
Dr.Saranya K.G
 
PPT
xml.ppt
RajaGanesan14
 
PPTX
Xml query language and navigation
Raghu nath
 
PPT
Do You Speak XML? Part 2
Romina Marazzato Sparano
 
PPT
Session 4
Lại Đức Chung
 
PPTX
Introductionto xslt
Kumar
 
PDF
Learning Xslt A Handson Introduction To Xslt And Xpath 1st Edition Michael Ja...
esgarkavos
 
PPTX
Xml session
Farag Zakaria
 
PPTX
Xml transformation language
reshmavasudev
 
PPT
Xpath
Manav Prasad
 
PDF
Querying XML: XPath and XQuery
Katrien Verbert
 
XPath Introduction
Stuart Myles
 
XPATH_XSLT-1.pptx
BalasundaramSr
 
03 x files
Baskarkncet
 
Xml presentation
Miguel Angel Teheran Garcia
 
X path
Sagar Guhe
 
X path
Sagar Guhe
 
xml.ppt
RajaGanesan14
 
Xml query language and navigation
Raghu nath
 
Do You Speak XML? Part 2
Romina Marazzato Sparano
 
Introductionto xslt
Kumar
 
Learning Xslt A Handson Introduction To Xslt And Xpath 1st Edition Michael Ja...
esgarkavos
 
Xml session
Farag Zakaria
 
Xml transformation language
reshmavasudev
 
Querying XML: XPath and XQuery
Katrien Verbert
 
Ad

More from Paul Filkin (9)

PPTX
Easier Audiovisual Translation with SDL Trados Studio
Paul Filkin
 
PPTX
Ask the Experts: SDL Trados live Q+A webinar for freelance translators
Paul Filkin
 
PPTX
Subtitling in SDL Trados Studio
Paul Filkin
 
PPTX
GALA 2014 Are you really interoperable?
Paul Filkin
 
PPTX
Trikonf 2015 - Community, Studio and the OpenExchange
Paul Filkin
 
PPTX
Taus qe summit dublin 2015 pemt analysis and valuation
Paul Filkin
 
PPTX
(150324) Everything you ever wanted to know about Studio!
Paul Filkin
 
PPTX
The New Lisbon University - SDL Open Exchange 2015
Paul Filkin
 
PPTX
FIT XXth World Congress in Berlin - SDL Tools Workshop
Paul Filkin
 
Easier Audiovisual Translation with SDL Trados Studio
Paul Filkin
 
Ask the Experts: SDL Trados live Q+A webinar for freelance translators
Paul Filkin
 
Subtitling in SDL Trados Studio
Paul Filkin
 
GALA 2014 Are you really interoperable?
Paul Filkin
 
Trikonf 2015 - Community, Studio and the OpenExchange
Paul Filkin
 
Taus qe summit dublin 2015 pemt analysis and valuation
Paul Filkin
 
(150324) Everything you ever wanted to know about Studio!
Paul Filkin
 
The New Lisbon University - SDL Open Exchange 2015
Paul Filkin
 
FIT XXth World Congress in Berlin - SDL Tools Workshop
Paul Filkin
 
Ad

Recently uploaded (20)

PDF
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
QAware GmbH
 
PDF
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
PDF
Microsoft Teams Essentials; The pricing and the versions_PDF.pdf
Q-Advise
 
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PDF
Wondershare Filmora 14.5.20.12999 Crack Full New Version 2025
gsgssg2211
 
PDF
Micromaid: A simple Mermaid-like chart generator for Pharo
ESUG
 
PPTX
TestNG for Java Testing and Automation testing
ssuser0213cb
 
PDF
Why Use Open Source Reporting Tools for Business Intelligence.pdf
Varsha Nayak
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
PPTX
oapresentation.pptx
mehatdhavalrajubhai
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PDF
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
PPTX
Smart Panchayat Raj e-Governance App.pptx
Rohitnikam33
 
PDF
Build Multi-agent using Agent Development Kit
FadyIbrahim23
 
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
QAware GmbH
 
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
Microsoft Teams Essentials; The pricing and the versions_PDF.pdf
Q-Advise
 
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Presentation about variables and constant.pptx
kr2589474
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
Wondershare Filmora 14.5.20.12999 Crack Full New Version 2025
gsgssg2211
 
Micromaid: A simple Mermaid-like chart generator for Pharo
ESUG
 
TestNG for Java Testing and Automation testing
ssuser0213cb
 
Why Use Open Source Reporting Tools for Business Intelligence.pdf
Varsha Nayak
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
Bandai Playdia The Book - David Glotz
BluePanther6
 
oapresentation.pptx
mehatdhavalrajubhai
 
Presentation about variables and constant.pptx
safalsingh810
 
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
Smart Panchayat Raj e-Governance App.pptx
Rohitnikam33
 
Build Multi-agent using Agent Development Kit
FadyIbrahim23
 

(151030) The X files

Editor's Notes

  • #3: XML is a software- and hardware-independent tool for storing and transporting data. It doesn’t do anything… it just stores data that’s used somewhere else, often in multiple places. Why is it called “XML”, my guess is that "XML" looked a lot cooler than "EML.“ The important part in here is actually the eXtensible bit… So XML has rules… but the developer has complete freedom to extend these and use them as he sees fit. So there are no predefined tags as there are in HTML for example
  • #4: All the translatable text is in elements and all the elements are translatable
  • #5: Studio, memoQ, Café Tran
  • #6: When only the title is in an element and all the lines have been moved into attributes
  • #7: When only the title is in an element and all the lines have been moved into attributes
  • #8: So Studio would allow you to create a custom XML filetype, you import the file and it extracts the elements and attributes for you to easily select with a dropdown box. Déjà Vu lists all the elements and attributes so you go through each one and tell it how to handle them. But not all tools allow you to handle this at all… so in my example Café Tran does not let customise XML requirements at all… so you would not be able to handle these files with Café Tran for example.
  • #9: In this example we have valid XML but it starts to get a little trickier… CDATA sections… and has the same text element but may not be an element you want to translate alternative translations that might be helpful as a preview Also the text elements are nested in verse# elements, so multiple rules could be required if you needed to be specific Max/min text lengths, maybe to suit some fixed window translate=“n” attribute
  • #10: In this example we have valid XML but it starts to get a little trickier… CDATA sections… and has the same text element but may not be an element you want to translate alternative translations that might be helpful as a preview Also the text elements are nested in verse# elements, so multiple rules could be required if you needed to be specific Max/min text lengths, maybe to suit some fixed window translate=“n” attribute
  • #11: In this example we have valid XML but it starts to get a little trickier… CDATA sections… and has the same text element but may not be an element you want to translate alternative translations that might be helpful as a preview Also the text elements are nested in verse# elements, so multiple rules could be required if you needed to be specific Max/min text lengths, maybe to suit some fixed window translate=“n” attribute
  • #12: In this example we have valid XML but it starts to get a little trickier… CDATA sections… and has the same text element but may not be an element you want to translate alternative translations that might be helpful as a preview Also the text elements are nested in verse# elements, so multiple rules could be required if you needed to be specific Max/min text lengths, maybe to suit some fixed window translate=“n” attribute
  • #13: So this is nothing specific to Studio or memoQ for example… this is a World Wide Web Consortium (w3c) recommendation as a web standard. So it’s easy to find information on how to learn this. We’re going to look at some basics to get you started.
  • #14: What all that means is just this…
  • #15: There isn’t a lot a theory around this, but it does take a little digesting and plenty of time to get your head around it all. It’s also all worth learning.
  • #16: But if at this point you’re starting to feel like he does… then don’t worry… we only have an hour!! Of course it’s important to know all this stuff if you want to be completely comfortable with XPath, but for our normal day to day needs I don’t think you do. So let’s try and cut to the chase
  • #17: Here we see two of a possible seven kinds of XPath nodes. An element and an attribute. “text” was the element simply referred to by name, and “line” was the attribute also referred to by name, but recognised as an attribute by using the @ symbol.
  • #18: Here we see two of a possible seven kinds of XPath nodes. An element and an attribute. “text” was the element simply referred to by name, and “line” was the attribute also referred to by name, but recognised as an attribute by using the @ symbol.
  • #19: It’s a little like a file path. We all know how these work and the path takes you straight to the file. Now add the XPath expression we just saw on the end… The // just means select from the text node no matter where in the path it is.
  • #20: It’s a little like a file path. We all know how these work and the path takes you straight to the file. Now add the XPath expression we just saw on the end… The // just means select from the text node no matter where in the path it is.
  • #23: Worth mentioning… but not free. Small built in helper tool available when using the memoQ multilingual XML filetype.
  • #30: Let’s consider this simple sentence and break it down into a few simple components.
  • #31: Let’s consider this simple sentence and break it down into a few simple components.
  • #32: Let’s consider this simple sentence and break it down into a few simple components.
  • #33: Let’s consider this simple sentence and break it down into a few simple components.
  • #34: Now let’s take the same sentence and mark it up like this. This is basically how an XPath expression works using predicates. The node test would be the complete subject, and the predicate is always enclosed in square brackets.
  • #35: Now let’s take the same sentence and mark it up like this. This is basically how an XPath expression works using predicates. The node test would be the complete subject, and the predicate is always enclosed in square brackets.
  • #36: Now let’s take the same sentence and mark it up like this. This is basically how an XPath expression works using predicates. The node test would be the complete subject, and the predicate is always enclosed in square brackets.
  • #37: Now let’s take the same sentence and mark it up like this. This is basically how an XPath expression works using predicates. The node test would be the complete subject, and the predicate is always enclosed in square brackets.
  • #40: This is all about conflict resolution. Namespaces provide a way to avoid element name conflicts, but sometimes these cause complexity for the simple task of extracting all the text for translation.
  • #45: XSLT is used to transform an XML document into another XML document, or another type of document that is recognized by a browser, like HTML and XHTML. With XSLT you can add/remove elements and attributes to or from the output file. You can also rearrange and sort elements, perform tests and make decisions about which elements to hide and display, and a lot more.
  • #50: memoQ can’t handle images unless they are referred to from a URL (confirmed by Gábor Ugray) but the style and flow of the document is there.
  • #51: memoQ can’t handle images unless they are referred to from a URL (confirmed by Gábor Ugray) but the style and flow of the document is there.