Using XML With The Progress 4gl by Gus Bjorklund
Using XML With The Progress 4gl by Gus Bjorklund
Gus Bjorklund
1, 18
Contents
INTRODUCTION.............................................................................................................................................................. 4 WHAT IS XML? ................................................................................................................................................................ 4 4GL REPRESENTATION OF THE DOM ............................................................................................................................... 6 THE DOM STRUCTURE MODEL ................................................................................................................................. 7 THE NEW 4GL OBJECTS............................................................................................................................................... 8 4GL OBJECT TYPES .......................................................................................................................................................... 8 4GL DOCUMENT OBJECTS ................................................................................................................................................ 9 4GL NODE REFERENCE OBJECTS...................................................................................................................................... 9 CREATING DOCUMENTS .................................................................................................................................................... 9 CREATING NODE REFERENCES.......................................................................................................................................... 9 CREATING NODES ............................................................................................................................................................. 9 CHARACTER SETS AND ENCODINGS..................................................................................................................... 11 ERROR HANDLING ...................................................................................................................................................... 11 EXAMPLES ..................................................................................................................................................................... 12 EXAMPLE 1: XML OUTPUT FROM 4GL .......................................................................................................................... 12 EXAMPLE 2: XML DOCUMENT PRODUCED BY EXAMPLE 1 ............................................................................................ 13 EXAMPLE 3: XML INPUT FROM 4GL.............................................................................................................................. 15 REFERENCES................................................................................................................................................................. 17 XML............................................................................................................................................................................... 17 DOM .............................................................................................................................................................................. 17 DOM PARSER ................................................................................................................................................................. 17 UNICODE...................................................................................................................................................................... 17
Introduction
This white paper describes a set of Progress 4GL capabilities, introduced in Version 9.1A, that enables Progress applications to use Extensible Markup Language (XML) as a means of data exchange. XML is being rapidly and widely adopted throughout the computer industry and becoming the preferred method for encoding data exchanged among applications. One of the most important future uses is for business-to-business communication. XML is enormously useful for this purpose because it enables many different kinds of data exchange in a standards-based, simple, flexible, and therefore cheaper way than was possible with previous methods. Heres what two respected industry analysts say about it:
"XML will revolutionize the exchange of business information similar to the way the phone, fax machine, and photocopier did when those devices were invented. Those [prior] innovations made a significant impact on how businesses viewed and exchanged information. XML is poised to impact the Internet area the same way." Ron Rappaport, Zona Research, December 31, 1998
"Work by the World Wide Web Consortium (W3C) to flesh out XML will give sellers a standard way to define catalogs and processes. By 2000, this technology will proliferate from the desktop to the back office creating a climate in which new buying models thrive and standards like OBI consolidate around XML." The Forrester Report, On-line Purchasing Futures, September 1998
What Is XML?
The Extensible Markup Language (XML ) is a data format for structured document interchange on the Web. It is hardware architecture neutral and application independent. XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board) formed under the auspices of the World Wide Web Consortium (W3C) in 1996. The World-Wide Web Consortiums official recommendation for XML and a variety of related materials can be found at the following URL: http://www.w3.org/XML/. XML is a subset of another markup language, called SGML, which was adopted as an international standard in 1986 [ISO 8879]. SGML is based on a still earlier markup language, called GML, which was developed by researchers at IBM in 1969. XML describes a class of data objects called XML documents and partially describes the behavior of computer programs that process them. XML is an application profile or restricted form of SGML. By construction, XML documents are conforming SGML documents. XML documents composed of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of a document's layout and logical structure. XML provides a mechanism to impose constraints on the layout and logical structure. A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another
module, called the application. The XML specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application. XML Documents XML documents are text composed of two parts: a prologue and a body. The optional prologue may contain the XML version the document conforms to, information about the character encoding used to encode the contents of the document, and a document type definition (DTD) which describes the grammar and vocabulary of the document. The body may contain elements, entity references, and other markup information. Elements represent the logical components of documents. They can contain either data or other elements. For example, a phone list element could contain a number of phone list entry elements and each entry element the data values for a signle entry. Here is an example of a simple element: <name>Clyde</name> Elements can have additional information called attributes attached to them. Attributes are used to describe properties of elements. Here is an example of an element with an attribute: <name emp-num=1>Mary</name> Here is an example of some elements that contain other elements and data: <phonelist> <entry> <name>Chip</name> <extension>3</extension> </entry> <entry> <name>Gus</name> <extension>5</extension> </entry> </phonelist> Document Type Definitions There are obviously an infinite variety of possible kinds of documents, such as a repair manual for a vehicle, a dictionary, a telephone directory, an order for equipment, an invoice, and so forth. Each kind of document can have a unique structure and organization that can be used over and over. The descriptions of classes of documents are called Document Type Definitions or DTDs. DTDs are rules that define the elements that can exist in a particular document or group of documents, and what the relationships among the various elements are. A DTD can be part of the content of an XML document or can be separate from it and referred to by the document. Here is an example of a document that includes a DTD in its prologue: <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE customer [ <!ELEMENT customer (name, cust-num)> <!ELEMENT name (#PCDATA)> <!ELEMENT cust-num (#PCDATA)> ]> <customer> <name>Lift Line Skiing</name><cust-num>1</cust-num>
5
</customer>
The DOM In an application, a raw XML document is cumbersome to work with because it just is one long string of characters. To allow an application to use data encoded in XML easily, something more is needed. 1 The Document Object Model (DOM) is an application programming interface (API) for HTML and XML documents. It defines the logical structure of XML documents and the way a document is accessed and manipulated. The W3Cs official recommendation for the DOM and a variety of related materials may be found at the following URL: http://www.w3.org/DOM/. In the DOM specification, the term "document" is used in the broad sense - increasingly, XML is being used as a way of representing many different kinds of information that may be stored in diverse systems, and much of this would traditionally be seen as data rather than as documents. Nevertheless, XML presents this data as documents, and the DOM may be used to manage these data. With the Document Object Model, programmers can build documents, navigate their structure, and add, modify, or delete elements and content. Anything found in an HTML or XML document can be accessed, changed, deleted, or added using the Document Object Model, with a few exceptions - in particular, the DOM interfaces for the XML internal and external subsets have not yet been specified.
will provide additional, higher-level interfaces that are used with the fundamental interfaces defined in the Core Level 1 section to provide a more convenient view of an HTML document. The language extensions provided with Progress Version 9.1A cover the Core DOM Level 1 only.
<TD>Aeolian</TD> </TR> <TR> <TD>Over the River, Charlie</TD> <TD>Dorian</TD> </TR> </TBODY> </TABLE> The diagram below illustrates the corresponding DOM tree representation of the same thing:
<TABLE>
<TBODY>
<TR>
<TR>
<TD>
<TD>
<TD>
<TD>
Shady Grove
Aeolian
Dorian
This gives us two new object types in the 4GL for XML document manipulation: the X-DOCUMENT, which represents an entire XML document tree, and the X-NODEREF, which represents a reference to a single node in the XML document tree.
Creating Documents
The creation and persistent saving of a document is not part of the DOM Core API, and is left to the application (i.e. the Progress 4GL interpreter) that calls the DOM API. You create an XML document using the 4GL CREATE statement and use the methods of the Progress object to save a new XML document or to load an existing XML document. CREATE X-DOCUMENT <handle>. This statement will create a Progress handle for an object of the type X-DOCUMENT that wraps an XML document. You may start adding nodes to it right away or use the LOAD() method to populate it from an existing XML document.
Creating Nodes
The Progress 4GL uses a common CREATE-NODE method on an X-DOCUMENT object to create an XML nodes of the various SUBTYPEs. The complete list of DOM interfaces that inherit from Node are as follows:
Attribute
Entity
Represents the Document Type Declaration or Schema declaration of the XML document. Represents a lightweight object used to store sections of an XML document temporarily. Represents an element node. This interface represents the data, or more precisely the tags of the XML document. Its important to notice that the text of the element is stored in a Text or CDATASection node, which is the child of the element. Represents an attribute of a document or an element. Typically the allowable values for the attribute are defined in a document type definition. Note that the attributes are NOT considered as child nodes of the element they describe Represents an entity, either parsed or unparsed, in the XML document. Represents a reference to an entity within the XML document. Represents a notation declared within the DTD. Provides methods and properties that are inherited by Text, Comment, and CDATASection. Represents a Text node that is a child of an element node. CDATA sections are used to escape blocks of text that would otherwise be regarded as markup. The primary purpose is for including XML fragments, without needing to escape all the delimiters. Represents the content of a comment. The Processing Instruction is a way to keep processor-specific information in the text of the document. Provides access to methods and properties that are application specific and independent of any specific implementation of the DOM.
The Progress 4GL provides the following SUBTYPEs for the X-NODEREF object: CDATA-SECTION COMMENT ELEMENT ENTITY-REFERENCE PROCESSING-INSTRUCTION TEXT-NODE The default SUBTYPE will be ELEMENT. The simplified DOM Node interfaces nodeName and nodeValue gives access to these interfaces as follows: Type DocumentType DocumentFragment Element Attribute
Entity
nodeName Document type name2 #document-fragment Tag name Name of attribute Entity name Name of entity referenced Notation name #text #cdata-section #comment Target
nodeValue Null Null Null Value of attribute Null Null Null Content of the text node Content of the CDATA section Content of the comment Content excluding the target
Names of elements, attributes, etc. in XML documents are constrained by the rules laid forth in the XML specification. The gist of these rules is that: Names must begin with a letter, digit, comma, hyphen, or underscore. Names may not start with the string xml or anything that would match it if case were insignificant.
10
The Progress 4GL NAME attribute is used to obtain the nodeName, while the new NODE-VALUE character attribute is used to obtain or set the nodes nodeValue.
Error Handling
Any of the methods of the X-DOCUMENT and X-NODEREF objects may encounter an error condition and fail, but this will not normally cause the Progress error status to be raised. Instead, the method will generally return FALSE. Also, the parser may encounter errors that do not cause the operation as a whole to fail. So instead of testing for ERROR-STATUS:ERROR after running a method with NOERROR, one should test for ERROR-STATUS:NUM-MESSAGES being greater than 0. Note that the DOM parser may detect errors in an input XML document even if validation is not specified for the LOAD() method call. Validation checks the document for conformance to a DTD, but there could be structure or other errors, such a missing end tag, mismatched, or improperly nested tags. The parser will report these errors even if validation against a DTD is not performed.
3 According to the XML recommendation, entities encoded in UTF-16 must begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors must be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents.
11
Examples
In this section, we provide several simple examples of 4GL programs using XML, and the XML documents they use. These examples show how to: Load an XML document Save an XML document Generate an XML document that contains data obtained from the database Obtain data from an XML document and store it into the database.
/* Create the objects we need. */ CREATE CREATE CREATE CREATE CREATE X-DOCUMENT hDoc. X-NODEREF hRoot. X-NODEREF hRow. X-NODEREF hField. X-NODEREF hText.
/* Get a buffer for the Customer table. */ hBuf = BUFFER Customer:HANDLE. /* Set up a root node. */ hDoc:CREATE-NODE (hRoot, "Customers", "ELEMENT"). hDoc:APPEND-CHILD (hRoot). FOR EACH Customer WHERE cust-num < 5: /* Create a customer row node. */ hDoc:CREATE-NODE (hRow, "Customer", "ELEMENT"). hRoot:APPEND-CHILD (hRow). /* Put the row in the tree. Cust-num and Name are attributes of this element. The remaining fields are elements. */
12
hRow:SET-ATTRIBUTE ("Cust-num", STRING (cust-num)). hRow:SET-ATTRIBUTE ("Name", NAME). /* Add the other fields as elements. */ REPEAT i = 1 TO hBuf:NUM-FIELDS: hDBFld = hBuf:BUFFER-FIELD (i). /* We already did Cust-num and Name above so skip them. */ IF hDBFld:NAME = "Cust-num" OR THEN NEXT. /* Create an Note that name. The stringent hDBFld:NAME = "NAME"
element with the field name as the tag. the field name is the same as the element rules for allowed names in XML are less than the rules for Progress column names. */
hDoc:CREATE-NODE (hField, hDBFld:NAME, "ELEMENT"). hRow:APPEND-CHILD (hField). /* Make new field next row child. */ hDoc:CREATE-NODE (hText, "", "TEXT"). /* Node to hold value. */ hField:APPEND-CHILD (hText). /* Attach text to field */ hText:NODE-VALUE = STRING (hDBFld:BUFFER-VALUE). END. END. /* Write the XML node tree to an xml file. */ hDoc:SAVE ("file", "cust.xml"). /* Delete the objects. Note that deleting the document object deletes the DOM structure under it also. */ DELETE DELETE DELETE DELETE DELETE OBJECT OBJECT OBJECT OBJECT OBJECT hDoc. hRoot. hRow. hField. hText.
<?xml version='1.0' encoding='utf-8' ?> <Customers> <Customer Name="Lift Line Skiing" Cust-num="1"> <Country>USA</Country> <Address>276 North Street</Address> <Address2> </Address2> <City>Boston</City> <State>MA</State> <Postal-Code>02114</Postal-Code> <Contact>Gloria Shepley</Contact> <Phone>(617) 450-0087</Phone> <Sales-Rep>HXM</Sales-Rep> <Credit-Limit>66700</Credit-Limit> <Balance>42568</Balance> <Terms>Net30</Terms> <Discount>35</Discount> <Comments>This customer is on credit hold.</Comments> </Customer> <Customer Name="Urpon Frisbee" Cust-num="2"> <Country>Finland</Country> <Address>Rattipolku 3</Address> <Address2></Address2> <City>Valkeala</City> <State>MA</State> <Postal-Code>45360</Postal-Code> <Contact>Urpo Leppakoski</Contact> <Phone>(60) 532 5471</Phone> <Sales-Rep>DKP</Sales-Rep> <Credit-Limit>27600</Credit-Limit> <Balance>17166</Balance> <Terms>Net30</Terms> <Discount>35</Discount> <Comments>Ship all products 2nd Day Air.</Comments> </Customer> <Customer Name="Hoops Croquet Co." Cust-num="3"> <Country>USA</Country> <Address>Suite 415</Address> <Address2>40 Grove St.</Address2> <City>Hingham</City> <State>MA</State> <Postal-Code>02111</Postal-Code> <Contact>Michael Traitser</Contact> <Phone>(617) 366-1557</Phone> <Sales-Rep>HXM</Sales-Rep> <Credit-Limit>75000</Credit-Limit> <Balance>66421</Balance> <Terms>Net30</Terms> <Discount>10</Discount> <Comments>This customer is now OFF credit hold.</Comments> </Customer> <Customer Name="Go Fishing Ltd" Cust-num="4"> <Country>United Kingdom</Country> <Address>Unit 2</Address> <Address2>83 Ponders End Rd</Address2> <City>Harrow</City> <State>MA</State> <Postal-Code>HA8 7BN</Postal-Code>
14
<Contact>Alan Frogbrook</Contact> <Phone>081 883 6827</Phone> <Sales-Rep>SLS</Sales-Rep> <Credit-Limit>15000</Credit-Limit> <Balance>689</Balance> <Terms>Net30</Terms> <Discount>10</Discount> <Comments> </Comments> </Customer> </Customers> Now that youve had a chance to read through the example and the output it produces, (you did read it didnt you?), wed like to make a few observations about it: As previously mentioned, the line breaks arent really there. We added them later to make the example more readable. The original document contained only two lines, the prologue and the body. There is no DTD. We chose to make the Name and Cust-num columns of the Customer table into attributes of the Customer element. The remaining columns are elements contained by the Customer element. Normally you probably wouldnt make some table columns into attributes and some into elements, but we wanted to show you that you can do it either way. Columns that have unknown values show up as empty elements there is no text between the opening and closing tag. The customer records expressed as XML are verbose. That is the nature of XML.
/* Create the objects we will need. */ CREATE CREATE CREATE CREATE CREATE X-DOCUMENT hDoc. X-NODEREF hRoot. X-NODEREF hTable. X-NODEREF hField. X-NODEREF hText.
/* Read in the file created by the output example. Note that the entire file is read and parsed at this point. */ hDoc:LOAD ("file", "Cust.xml", FALSE). /* Get the root of the structure */ hDoc:GET-DOCUMENT-ELEMENT (hRoot). /* Read each Customer from the root. */ REPEAT i = 1 TO hRoot:NUM-CHILDREN: hRoot:GET-CHILD (hTable,i). /* Create a customer record in the database. */ CREATE Customer. /* Cust-num and Name are attributes, so we have to get those values from there. */ cust-num = integer (hTable:GET-ATTRIBUTE ("Cust-num")). NAME = hTable:GET-ATTRIBUTE ("Name"). /* The remaining fields are elements with text values. */ REPEAT j = 1 TO hTable:NUM-CHILDREN: hTable:GET-CHILD (hField,j). /* skip any null values */ IF hField:NUM-CHILDREN < 1 THEN NEXT. hDBFld = hBuf:BUFFER-FIELD (hField:NAME). /* Get text value of field and put it in the customer buffer. Note that the field name is assumed to be the element name. But the rules for allowed names in XML are less stringent than the rules for Progress column names. */ hField:GET-CHILD (hText,1). hDBFld:BUFFER-VALUE = hTEXT:NODE-VALUE. END. END. /* Delete the objects we created. Note that when we delete hDoc, the structure under it will be deleted as well. */ DELETE DELETE DELETE DELETE DELETE OBJECT OBJECT OBJECT OBJECT OBJECT hDoc. hRoot. hTable. hField. hText.
16
References
XML
W3C (World Wide Web Consortium). Extensible Markup Language (XML) 1.0. Available at http://www.w3.org/TR/REC-xml. Charles F. Goldfarb, Paul Prescod. The XML Handbook. Prentice Hall PTR, 1998. ISBN 0-13081152-1.
DOM
W3C (World Wide Web Consortium). Document Object Model (DOM) Specification Level 1, 01 October 1998. Available at http://www.w3.org/TR/1998/REC-DOM-Level-1.
DOM Parser
IBM. IBM's XML for C++ Parser. Documentation and software available at http://www.alphaWorks.ibm.com/tech/xml4c.
UNICODE
The Unicode Consortium. The Unicode Standard, Version 2.0. Addison-Wesley Developers Press, 1996.
17
Corporate Headquarters Progress Software Corporation, 14 Oak Park, Bedford, MA 01730 USA Tel: 781 280 4000 Fax: 781 280 4095 Europe/Middle East/Africa Headquarters Progress Software Europe B.V. Schorpioenstraat 67 3067 GG Rotterdam, The Netherlands Tel: 31 10 286 5700 Fax: 31 10 286 5777 Latin American Headquarters Progress Software Corporation, 2255 Glades Road, One Boca Place, Suite 300 E, Boca Raton, FL 33431 USA Tel: 561 998 2244 Fax: 561 998 1573 Asia/Pacific Headquarters Progress Software Pty. Ltd., 1911 Malvern Road, Malvern East, 3145, Australia Tel: 61 39 885 0544 Fax: 61 39 885 9473 Progress and IPQoS are registered trademarks of Progress Software Corporation. All other trademarks, marked and not marked, are the property of their respective owners.
www.progress.com
Specifications subject to change without notice. 1999 Progress Software Corporation. All rights reserved.