Using Web Services: Python For Informatics: Exploring Information
Using Web Services: Python For Informatics: Exploring Information
Chapter 13
Python Java
Dictionary HashMap
<person>
<name>
De-Serialize
Chuck
Python </name> Java
Dictionary <phone> HashMap
Serializ 303 4456
e </phone>
</person>
XML
Agreeing on a “Wire Format”
{
"name" : De-Serialize
Python "Chuck", Java
Dictionary "phone" : HashMap
Serializ "303-4456"
e
}
JSO
N
XML “Elements” (or Nodes)
<people>
<person>
<name>Chuck</name>
• Simple Element
<phone>303 4456</phone>
</person>
• Complex Element <person>
<name>Noah</name>
<phone>622 7421</phone>
</person>
</people>
XML
Marking up data to send across the network...
http://en.wikipedia.org/wiki/XML
eXtensible Markup Language
• Primary purpose is to help information systems share structured
data
http://en.wikipedia.org/wiki/XML
XML Basics
• Start Tag <person>
<name>Chuck</name>
• End Tag
<phone type="intl">
• Text Content +1 734 303 4456
• Attribute
</phone>
<email hide="yes" />
• Self Closing Tag </person>
<person>
<name>Chuck</name>
White Space
<phone type="intl"> Line ends do not matter.
+1 734 303 4456 White space is generally
discarded on text elements.
</phone> We indent only to be
<email hide="yes" /> readable.
</person> <person>
<name>Chuck</name>
<phone type="intl">+1 734 303
4456</phone>
<email hide="yes" />
</person>
Some XML...
http://en.wikipedia.org/wiki/XML
XML Terminology
http://en.wikipedia.org/wiki/Serialization
XML as a Tree
<a> a
<b>X</b>
<c>
b c
<d>Y</d>
X d e
<e>Z</e>
</c>
</a>
Y Z
Elements Text
XML Text and Attributes
a
<a>
<b w="5">X</b>
<c> w
b text
c
<d>Y</d> attrib node
<e>Z</e>
5 X d e
</c>
</a>
Y Z
Elements Text
XML as a
Paths
<a> b c
<b>X</b>
<c>
X d e
<d>Y</d>
<e>Z</e> Y Z
</c> /a/b X
</a> /a/c/d Y
Elements Text /a/c/e Z
XML Schema
Describing a “contract” as to what is acceptable XML.
http://en.wikipedia.org/wiki/Xml_schema
http://en.wikibooks.org/wiki/XML_Schema
XML Schema
• Description of the legal format of an XML document
Validator
XML Schema
Contract
XML
XML
<person>
Document Validation
<lastname>Severance</lastname>
<age>17</age>
<dateborn>2001-04-17</dateborn>
</person>
XML Schema
Contract
<xs:complexType name=”person”>
Validator
<xs:sequence>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="dateborn" type="xs:date"/>
</xs:sequence>
</xs:complexType>
Many XML Schema Languages
• Document Type Definition (DTD)
• http://en.wikipedia.org/wiki/Document_Type_Definition
• http://en.wikipedia.org/wiki/SGML
• http://en.wikipedia.org/wiki/XML_Schema_(W3C)
http://en.wikipedia.org/wiki/Xml_schema
XSD XML Schema (W3C spec)
• We will focus on the World Wide Web Consortium (W3C) version
http://www.w3.org/XML/Schema
http://en.wikipedia.org/wiki/XML_Schema_(W3C)
XSD
Structure <person>
<lastname>Severance</lastname>
<age>17</age>
<dateborn>2001-04-17</dateborn>
</person>
• xs:element
• xs:sequence
<xs:complexType name=”person”>
<xs:sequence>
• xs:complexType
<xs:element name="lastname" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="dateborn" type="xs:date"/>
</xs:sequence>
</xs:complexType>
<xs:element name="person">
<xs:complexType>
<xs:sequence>
XSD
<xs:element name="full_name" type="xs:string"
minOccurs="1" maxOccurs="1" /> Constraints
<xs:element name="child_name" type="xs:string"
minOccurs="0" maxOccurs="10" />
</xs:sequence>
</xs:complexType>
<person>
</xs:element>
<full_name>Tove Refsnes</full_name>
<child_name>Hege</child_name>
<child_name>Stale</child_name>
<child_name>Jim</child_name>
<child_name>Borge</child_name>
</person>
http://www.w3schools.com/Schema/schema_complex_indicators.as
p
XSD
<xs:element name="customer" type="xs:string"/>
<xs:element name="start" type="xs:date"/>
Data
<xs:element name="startdate" type="xs:dateTime"/>
<xs:element name="prize" type="xs:decimal"/> Types
<xs:element name="weeks" type="xs:integer"/>
<customer>John Smith</customer>
It is common to represent
<start>2002-09-24</start>
time in UTC/GMT, given
that servers are often <startdate>2002-05-30T09:30:10Z</startdate>
scattered around the world. <prize>999.50</prize>
<weeks>30</weeks>
http://www.w3schools.com/Schema/schema_dtypes_numeric.asp
ISO 8601 Date/Time Format
2002-05-30T09:30:10Z
Time of Timezone - typically
Year-month-day specified in UTC / GMT
day
rather than local time
zone.
http://en.wikipedia.org/wiki/ISO_8601
http://en.wikipedia.org/wiki/Coordinated_Universal_Time
http://www.w3schools.com/Schema/schema_example.asp
xml1.py
import xml.etree.ElementTree as ET
data = '''<person>
<name>Chuck</name>
<phone type="intl">
+1 734 303 4456
</phone>
<email hide="yes"/>
</person>'''
tree = ET.fromstring(data)
print 'Name:',tree.find('name').text
print 'Attr:',tree.find('email').get('hide')
import xml.etree.ElementTree as ET
xml2.py
input = '''<stuff>
<users>
<user x="2">
<id>001</id>
<name>Chuck</name>
</user>
<user x="7">
<id>009</id>
<name>Brent</name>
</user>
</users>
</stuff>'''
stuff = ET.fromstring(input)
lst = stuff.findall('users/user')
print 'User count:', len(lst)
for item in lst:
print 'Name', item.find('name').text
print 'Id', item.find('id').text
print 'Attribute', item.get("x")
JavaScript Object Notation
JavaScript Object Notation
• Douglas Crockford -
“Discovered” JSON
• Object literal notation in
JavaScript
http://www.youtube.com/watch?v=kc8BAR7SHJI
json1.py
import json
data = '''{
"name" : "Chuck",
"phone" : {
"type" : "intl",
"number" : "+1 734 303 4456" JSON represents data
}, as nested “lists” and
"email" : { “dictionaries”
"hide" : "yes"
}
}'''
info = json.loads(data)
print 'Name:',info["name"]
print 'Hide:',info["email"]["hide"]
json2.py
import json
input = '''[
{ "id" : "001",
"x" : "2",
"name" : "Chuck"
} ,
{ "id" : "009", JSON represents data
"x" : "7",
"name" : "Chuck"
as nested “lists” and
} “dictionaries”
]'''
info = json.loads(input)
print 'User count:', len(info)
for item in info:
print 'Name', item['name']
print 'Id', item['id']
print 'Attribute', item['x']
Service Oriented Approach
http://en.wikipedia.org/wiki/Service-oriented_architecture
Service Oriented Approach
• Most non-trivial web applications use
services Application
http://www.youtube.com/watch?v=mj-kCFzF0ME 5:15
Web Services
http://en.wikipedia.org/wiki/Web_services
Application Program Interface
The API itself is largely abstract in that it specifies an
interface and controls the behavior of the objects
specified in that interface. The software that provides the
functionality described by an API is said to be an
“implementation” of the API. An API is typically defined
in terms of the programming language used to build an
application.
http://en.wikipedia.org/wiki/API
Web Service Technologies
• SOAP - Simple Object Access Protocol (software)
try: js = json.loads(str(data))
except: js = None
if 'status' not in js or js['status'] != 'OK':
print '==== Failure To Retrieve ===='
print data
continue
lat = js["results"][0]["geometry"]["location"]["lat"]
lng = js["results"][0]["geometry"]["location"]["lng"]
print 'lat',lat,'lng',lng
location = js['results'][0]['formatted_address']
print location
geojson.py
API Security and Rate Limiting
• The compute resources to run these APIs are not “free”
• The data providers might limit the number of requests per day,
demand an API “key”, or even charge for usage
https://api.twitter.com/1.1/statuses/user_timeline.json?count=2
&oauth_version=1.0&oauth_token=101...SGI&screen_name=drchuck&oa
uth_nonce=09239679&oauth_timestamp=1380395644&oauth_signature=r
LK...BoD&oauth_consumer_key=h7Lu...GNg&oauth_signature_method=H
MAC-SHA1
Summary
• Service Oriented Architecture - allows an application to be broken
into parts and distributed across a network