What's New in Python: A. M. Kuchling
What's New in Python: A. M. Kuchling
Release 2.7.15
A. M. Kuchling
Contents
16 Acknowledgements 36
Index 37
Regular Python dictionaries iterate over key/value pairs in arbitrary order. Over the years, a number of
authors have written alternative implementations that remember the order that the keys were originally
inserted. Based on the experiences from those implementations, 2.7 introduces a new OrderedDict class in
the collections module.
The OrderedDict API provides the same interface as regular dictionaries but iterates over keys and values
in a guaranteed order depending on when a key was first inserted:
If a new entry overwrites an existing entry, the original insertion position is left unchanged:
>>> d['second'] = 4
>>> d.items()
[('first', 1), ('second', 4), ('third', 3)]
The popitem() method has an optional last argument that defaults to True. If last is true, the most recently
added key is returned and removed; if it’s false, the oldest key is selected:
>>> od = OrderedDict([(x,0) for x in range(20)])
>>> od.popitem()
(19, 0)
>>> od.popitem()
(18, 0)
>>> od.popitem(last=False)
(0, 0)
>>> od.popitem(last=False)
(1, 0)
Comparing two ordered dictionaries checks both the keys and values, and requires that the insertion order
was the same:
Comparing an OrderedDict with a regular dictionary ignores the insertion order and just compares the keys
and values.
How does the OrderedDict work? It maintains a doubly-linked list of keys, appending new keys to the list
as they’re inserted. A secondary dictionary maps keys to their corresponding list node, so deletion doesn’t
have to traverse the entire linked list and therefore remains O(1).
The standard library now supports use of ordered dictionaries in several modules.
• The ConfigParser module uses them by default, meaning that configuration files can now be read,
modified, and then written back in their original order.
• The _asdict() method for collections.namedtuple() now returns an ordered dictionary with the
values appearing in the same order as the underlying tuple indices.
• The json module’s JSONDecoder class constructor was extended with an object_pairs_hook parameter
to allow OrderedDict instances to be built by the decoder. Support was also added for third-party
tools like PyYAML.
See also:
PEP 372 - Adding an ordered dictionary to collections PEP written by Armin Ronacher and Ray-
mond Hettinger; implemented by Raymond Hettinger.
This mechanism is not adaptable at all; commas are always used as the separator and the grouping is always
into three-digit groups. The comma-formatting mechanism isn’t as general as the locale module, but it’s
easier to use.
See also:
PEP 378 - Format Specifier for Thousands Separator PEP written by Raymond Hettinger; imple-
mented by Eric Smith.
args = parser.parse_args()
print args.__dict__
Unless you override it, -h and --help switches are automatically added, and produce neatly formatted
output:
Command-line example.
positional arguments:
inputs input filenames (default is stdin)
optional arguments:
-h, --help show this help message and exit
-v produce verbose output
-o FILE direct output to FILE instead of stdout
-C NUM display NUM lines of added context
As with optparse, the command-line switches and arguments are returned as an object with attributes
named by the dest parameters:
argparse has much fancier validation than optparse; you can specify an exact number of arguments as an
integer, 0 or more arguments by passing '*', 1 or more by passing '+', or an optional argument with '?'.
A top-level parser can contain sub-parsers to define subcommands that have different sets of switches, as in
svn commit, svn checkout, etc. You can specify an argument’s type as FileType, which will automatically
open files for you and understands that '-' means standard input or output.
See also:
argparse documentation The documentation page of the argparse module.
argparse-from-optparse Part of the Python documentation, describing how to convert code that uses
optparse.
PEP 389 - argparse - New Command Line Parsing Module PEP written and implemented by
Steven Bethard.
import logging
import logging.config
configdict = {
'version': 1, # Configuration schema in use; must be 1 for now
'formatters': {
'standard': {
'format': ('%(asctime)s %(name)-15s '
'%(levelname)-8s %(message)s')}},
# Set up configuration
logging.config.dictConfig(configdict)
netlogger = logging.getLogger('network')
netlogger.error('Connection failed')
Three smaller enhancements to the logging module, all implemented by Vinay Sajip, are:
• The SysLogHandler class now supports syslogging over TCP. The constructor has a socktype parameter
giving the type of socket to use, either socket.SOCK_DGRAM for UDP or socket.SOCK_STREAM for TCP.
The default protocol remains UDP.
• Logger instances gained a getChild() method that retrieves a descendant logger using a relative
path. For example, once you retrieve a logger by doing log = getLogger('app'), calling log.
getChild('network.listen') is equivalent to getLogger('app.network.listen').
• The LoggerAdapter class gained an isEnabledFor() method that takes a level and returns whether
the underlying logger would process a message of that level of importance.
See also:
PEP 391 - Dictionary-Based Configuration For Logging PEP written and implemented by Vinay
Sajip.
Views can be iterated over, but the key and item views also behave like sets. The & operator performs
intersection, and | performs a union:
The view keeps track of the dictionary and its contents change as the dictionary is modified:
>>> vk = d.viewkeys()
>>> vk
dict_keys([0, 130, 10, ..., 250])
>>> d[260] = '&'
>>> vk
dict_keys([0, 130, 260, 10, ..., 250])
However, note that you can’t add or remove keys while you’re iterating over the view:
You can use the view methods in Python 2.x code, and the 2to3 converter will change them to the standard
keys(), values(), and items() methods.
See also:
PEP 3106 - Revamping dict.keys(), .values() and .items() PEP written by Guido van Rossum.
Backported to 2.7 by Alexandre Vassalotti; bpo-1967.
9 PEP 3137: The memoryview Object
The memoryview object provides a view of another object’s memory content that matches the bytes type’s
interface.
>>> import string
>>> m = memoryview(string.letters)
>>> m
<memory at 0x37f850>
>>> len(m) # Returns length of underlying object
52
>>> m[0], m[25], m[26] # Indexing returns one byte
('a', 'z', 'A')
>>> m2 = m[0:26] # Slicing returns another memoryview
>>> m2
<memory at 0x37f080>
The content of the view can be converted to a string of bytes or a list of integers:
>>> m2.tobytes()
'abcdefghijklmnopqrstuvwxyz'
>>> m2.tolist()
[97, 98, 99, 100, 101, 102, 103, ... 121, 122]
>>>
memoryview objects allow modifying the underlying object if it’s a mutable object.
>>> m2[0] = 75
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot modify read-only memory
>>> b = bytearray(string.letters) # Creating a mutable object
>>> b
bytearray(b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
>>> mb = memoryview(b)
>>> mb[0] = '*' # Assign to view, changing the bytearray.
>>> b[0:5] # The bytearray has been changed.
bytearray(b'*bcde')
>>>
See also:
PEP 3137 - Immutable Bytes and Mutable Buffer PEP written by Guido van Rossum. Imple-
mented by Travis Oliphant, Antoine Pitrou and others. Backported to 2.7 by Antoine Pitrou; bpo-2396.
is equivalent to:
with A() as a:
with B() as b:
... suite of statements ...
The contextlib.nested() function provides a very similar function, so it’s no longer necessary and
has been deprecated.
(Proposed in https://codereview.appspot.com/53094; implemented by Georg Brandl.)
• Conversions between floating-point numbers and strings are now correctly rounded on most platforms.
These conversions occur in many different places: str() on floats and complex numbers; the float and
complex constructors; numeric formatting; serializing and deserializing floats and complex numbers
using the marshal, pickle and json modules; parsing of float and imaginary literals in Python code;
and Decimal-to-float conversion.
Related to this, the repr() of a floating-point number x now returns a result based on the shortest
decimal string that’s guaranteed to round back to x under correct rounding (with round-half-to-even
rounding mode). Previously it gave a string based on rounding x to 17 decimal digits.
The rounding library responsible for this improvement works on Windows and on Unix platforms using
the gcc, icc, or suncc compilers. There may be a small number of platforms where correct operation
of this code cannot be guaranteed, so the code is not used on such systems. You can find out which
code is being used by checking sys.float_repr_style, which will be short if the new code is in use
and legacy if it isn’t.
Implemented by Eric Smith and Mark Dickinson, using David Gay’s dtoa.c library; bpo-7117.
• Conversions from long integers and regular integers to floating point now round differently, returning
the floating-point number closest to the number. This doesn’t matter for small integers that can
be converted exactly, but for large numbers that will unavoidably lose precision, Python 2.7 now
approximates more closely. For example, Python 2.6 computed the following:
>>> n = 295147905179352891391
>>> float(n)
2.9514790517935283e+20
>>> n - long(float(n))
65535L
Python 2.7’s floating-point result is larger, but much closer to the true value:
>>> n = 295147905179352891391
>>> float(n)
2.9514790517935289e+20
>>> n - long(float(n))
-1L
The auto-numbering takes the fields from left to right, so the first {...} specifier will use the first
argument to str.format(), the next specifier will use the next argument, and so on. You can’t mix
auto-numbering and explicit numbering – either number all of your specifier fields or none of them –
but you can mix auto-numbering and named fields, as in the second example above. (Contributed by
Eric Smith; bpo-5237.)
Complex numbers now correctly support usage with format(), and default to being right-aligned.
Specifying a precision or comma-separation applies to both the real and imaginary parts of the number,
but a specified field width and alignment is applied to the whole of the resulting 1.5+3j output.
(Contributed by Eric Smith; bpo-1588 and bpo-7988.)
The ‘F’ format code now always formats its output using uppercase characters, so it will now produce
‘INF’ and ‘NAN’. (Contributed by Eric Smith; bpo-3382.)
A low-level change: the object.__format__() method now triggers a PendingDeprecationWarning
if it’s passed a format string, because the __format__() method for object converts the object to a
string representation and formats that. Previously the method silently applied the format string to
the string representation, but that could hide mistakes in Python code. If you’re supplying formatting
information such as an alignment or precision, presumably you’re expecting the formatting to be
applied in some object-specific way. (Fixed by Eric Smith; bpo-7994.)
• The int() and long() types gained a bit_length method that returns the number of bits necessary
to represent its argument in binary:
>>> n = 37
>>> bin(n)
'0b100101'
>>> n.bit_length()
(continues on next page)
(continued from previous page)
6
>>> n = 2**123-1
>>> n.bit_length()
123
>>> (n+1).bit_length()
124
export PYTHONWARNINGS=all,error:::Cookie:0
10.2 Optimizations
Several performance enhancements have been added:
• A new opcode was added to perform the initial setup for with statements, looking up the __enter__()
and __exit__() methods. (Contributed by Benjamin Peterson.)
• The garbage collector now performs better for one common usage pattern: when many objects are being
allocated without deallocating any of them. This would previously take quadratic time for garbage
collection, but now the number of full garbage collections is reduced as the number of objects on the
heap grows. The new logic only performs a full garbage collection pass when the middle generation
has been collected 10 times and when the number of survivor objects from the middle generation
exceeds 10% of the number of objects in the oldest generation. (Suggested by Martin von Löwis and
implemented by Antoine Pitrou; bpo-4074.)
• The garbage collector tries to avoid tracking simple containers which can’t be part of a cycle. In
Python 2.7, this is now true for tuples and dicts containing atomic types (such as ints, strings, etc.).
Transitively, a dict containing tuples of atomic types won’t be tracked either. This helps reduce the
cost of each garbage collection by decreasing the number of objects to be considered and traversed by
the collector. (Contributed by Antoine Pitrou; bpo-4688.)
• Long integers are now stored internally either in base 2**15 or in base 2**30, the base being determined
at build time. Previously, they were always stored in base 2**15. Using base 2**30 gives significant
performance improvements on 64-bit machines, but benchmark results on 32-bit machines have been
mixed. Therefore, the default is to use base 2**30 on 64-bit machines and base 2**15 on 32-bit
machines; on Unix, there’s a new configure option --enable-big-digits that can be used to override
this default.
Apart from the performance improvements this change should be invisible to end users, with one
exception: for testing and debugging purposes there’s a new structseq sys.long_info that provides
information about the internal format, giving the number of bits per digit and the size in bytes of the
C type used to store each digit:
There are three additional Counter methods. most_common() returns the N most common elements
and their counts. elements() returns an iterator over the contained elements, repeating each element
as many times as its count. subtract() takes an iterable and subtracts one for each element instead
of adding; if the argument is a dictionary or another Counter, the counts are subtracted.
>>> c.most_common(5)
[(' ', 6), ('e', 5), ('s', 3), ('a', 2), ('i', 2)]
>>> c.elements() ->
(continues on next page)
(continued from previous page)
'a', 'a', ' ', ' ', ' ', ' ', ' ', ' ',
'e', 'e', 'e', 'e', 'e', 'g', 'f', 'i', 'i',
'h', 'h', 'm', 'l', 'l', 'o', 'n', 'p', 's',
's', 's', 'r', 't', 't', 'x'
>>> c['e']
5
>>> c.subtract('very heavy on the letter e')
>>> c['e'] # Count is now lower
-1
itertools.combinations_with_replacement('abc', 2) =>
('a', 'a'), ('a', 'b'), ('a', 'c'),
('b', 'b'), ('b', 'c'), ('c', 'c')
Note that elements are treated as unique depending on their position in the input, not their actual
values.
The itertools.count() function now has a step argument that allows incrementing by values other
than 1. count() also now allows keyword arguments, and using non-integer values such as floats or
Decimal instances. (Implemented by Raymond Hettinger; bpo-5032.)
itertools.combinations() and itertools.product() previously raised ValueError for values of r
larger than the input iterable. This was deemed a specification error, so they now return an empty
iterator. (Fixed by Raymond Hettinger; bpo-4816.)
• Updated module: The json module was upgraded to version 2.0.9 of the simplejson package, which
includes a C extension that makes encoding and decoding faster. (Contributed by Bob Ippolito; bpo-
4136.)
To support the new collections.OrderedDict type, json.load() now has an optional ob-
ject_pairs_hook parameter that will be called with any object literal that decodes to a list of pairs.
(Contributed by Raymond Hettinger; bpo-5381.)
• The mailbox module’s Maildir class now records the timestamp on the directories it reads, and only re-
reads them if the modification time has subsequently changed. This improves performance by avoiding
unneeded directory scans. (Fixed by A.M. Kuchling and Antoine Pitrou; bpo-1607951, bpo-6896.)
• New functions: the math module gained erf() and erfc() for the error function and the complemen-
tary error function, expm1() which computes e**x - 1 with more precision than using exp() and
subtracting 1, gamma() for the Gamma function, and lgamma() for the natural log of the Gamma
function. (Contributed by Mark Dickinson and nirinA raseliarison; bpo-3366.)
• The multiprocessing module’s Manager* classes can now be passed a callable that will be called
whenever a subprocess is started, along with a set of arguments that will be passed to the callable.
(Contributed by lekma; bpo-5585.)
The Pool class, which controls a pool of worker processes, now has an optional maxtasksperchild
parameter. Worker processes will perform the specified number of tasks and then exit, causing the
Pool to start a new worker. This is useful if tasks may leak memory or other resources, or if some
tasks will cause the worker to become very large. (Contributed by Charles Cazabon; bpo-6963.)
• The nntplib module now supports IPv6 addresses. (Contributed by Derek Morr; bpo-1664.)
• New functions: the os module wraps the following POSIX system calls: getresgid() and
getresuid(), which return the real, effective, and saved GIDs and UIDs; setresgid() and
setresuid(), which set real, effective, and saved GIDs and UIDs to new values; initgroups(), which
initialize the group access list for the current process. (GID/UID functions contributed by Travis H.;
bpo-6508. Support for initgroups added by Jean-Paul Calderone; bpo-7333.)
The os.fork() function now re-initializes the import lock in the child process; this fixes problems on
Solaris when fork() is called from a thread. (Fixed by Zsolt Cserna; bpo-7242.)
• In the os.path module, the normpath() and abspath() functions now preserve Unicode; if their input
path is a Unicode string, the return value is also a Unicode string. (normpath() fixed by Matt Giuca
in bpo-5827; abspath() fixed by Ezio Melotti in bpo-3426.)
• The pydoc module now has help for the various symbols that Python uses. You can now do help('<<')
or help('@'), for example. (Contributed by David Laban; bpo-4739.)
• The re module’s split(), sub(), and subn() now accept an optional flags argument, for consistency
with the other functions in the module. (Added by Gregory P. Smith.)
• New function: run_path() in the runpy module will execute the code at a provided path argument.
path can be the path of a Python source file (example.py), a compiled bytecode file (example.pyc),
a directory (./package/), or a zip archive (example.zip). If a directory or zip path is provided, it
will be added to the front of sys.path and the module __main__ will be imported. It’s expected that
the directory or zip contains a __main__.py; if it doesn’t, some other __main__.py might be imported
from a location later in sys.path. This makes more of the machinery of runpy available to scripts
that want to mimic the way Python’s command line processes an explicit path name. (Added by Nick
Coghlan; bpo-6816.)
• New function: in the shutil module, make_archive() takes a filename, archive type (zip or tar-
format), and a directory path, and creates an archive containing the directory’s contents. (Added by
Tarek Ziadé.)
shutil’s copyfile() and copytree() functions now raise a SpecialFileError exception when asked
to copy a named pipe. Previously the code would treat named pipes like a regular file by opening them
for reading, and this would block indefinitely. (Fixed by Antoine Pitrou; bpo-3002.)
• The signal module no longer re-installs the signal handler unless this is truly necessary, which fixes
a bug that could make it impossible to catch the EINTR signal robustly. (Fixed by Charles-Francois
Natali; bpo-8354.)
• New functions: in the site module, three new functions return various site- and user-
specific paths. getsitepackages() returns a list containing all global site-packages directories,
getusersitepackages() returns the path of the user’s site-packages directory, and getuserbase()
returns the value of the USER_BASE environment variable, giving the path to a directory that can be
used to store data. (Contributed by Tarek Ziadé; bpo-6693.)
The site module now reports exceptions occurring when the sitecustomize module is imported,
and will no longer catch and swallow the KeyboardInterrupt exception. (Fixed by Victor Stinner;
bpo-3137.)
• The create_connection() function gained a source_address parameter, a (host, port) 2-tuple
giving the source address that will be used for the connection. (Contributed by Eldon Ziegler; bpo-
3972.)
The recv_into() and recvfrom_into() methods will now write into objects that support the buffer
API, most usefully the bytearray and memoryview objects. (Implemented by Antoine Pitrou; bpo-
8104.)
• The SocketServer module’s TCPServer class now supports socket timeouts and disabling the Nagle
algorithm. The disable_nagle_algorithm class attribute defaults to False; if overridden to be true,
new request connections will have the TCP_NODELAY option set to prevent buffering many small
sends into a single TCP packet. The timeout class attribute can hold a timeout in seconds that will
be applied to the request socket; if no request is received within that time, handle_timeout() will
be called and handle_request() will return. (Contributed by Kristján Valur Jónsson; bpo-6192 and
bpo-6267.)
• Updated module: the sqlite3 module has been updated to version 2.6.0 of the pysqlite package.
Version 2.6.0 includes a number of bugfixes, and adds the ability to load SQLite extensions from
shared libraries. Call the enable_load_extension(True) method to enable extensions, and then call
load_extension() to load a particular shared library. (Updated by Gerhard Häring.)
• The ssl module’s SSLSocket objects now support the buffer API, which fixed a test suite failure (fix
by Antoine Pitrou; bpo-7133) and automatically set OpenSSL’s SSL_MODE_AUTO_RETRY, which will
prevent an error code being returned from recv() operations that trigger an SSL renegotiation (fix by
Antoine Pitrou; bpo-8222).
The ssl.wrap_socket() constructor function now takes a ciphers argument that’s a string listing the
encryption algorithms to be allowed; the format of the string is described in the OpenSSL documen-
tation. (Added by Antoine Pitrou; bpo-8322.)
Another change makes the extension load all of OpenSSL’s ciphers and digest algorithms so that
they’re all available. Some SSL certificates couldn’t be verified, reporting an “unknown algorithm”
error. (Reported by Beda Kosata, and fixed by Antoine Pitrou; bpo-8484.)
The version of OpenSSL being used is now available as the module attributes ssl.OPENSSL_VERSION
(a string), ssl.OPENSSL_VERSION_INFO (a 5-tuple), and ssl.OPENSSL_VERSION_NUMBER (an integer).
(Added by Antoine Pitrou; bpo-8321.)
• The struct module will no longer silently ignore overflow errors when a value is too large for a
particular integer format code (one of bBhHiIlLqQ); it now always raises a struct.error exception.
(Changed by Mark Dickinson; bpo-1523.) The pack() function will also attempt to use __index__()
to convert and pack non-integers before trying the __int__() method or reporting an error. (Changed
by Mark Dickinson; bpo-8300.)
• New function: the subprocess module’s check_output() runs a command with a specified set of
arguments and returns the command’s output as a string when the command runs without error, or
raises a CalledProcessError exception otherwise.
(Python 2.7 actually produces slightly different output, since it returns a named tuple instead of a
standard tuple.)
The urlparse module also supports IPv6 literal addresses as defined by RFC 2732 (contributed by
Senthil Kumaran; bpo-2987).
>>> urlparse.urlparse('http://[1080::8:800:200C:417A]/foo')
ParseResult(scheme='http', netloc='[1080::8:800:200C:417A]',
path='/foo', params='', query='', fragment='')
• New class: the WeakSet class in the weakref module is a set that only holds weak references to
its elements; elements will be removed once there are no references pointing to them. (Originally
implemented in Python 3.x by Raymond Hettinger, and backported to 2.7 by Michael Foord.)
• The ElementTree library, xml.etree, no longer escapes ampersands and angle brackets when out-
putting an XML processing instruction (which looks like <?xml-stylesheet href="#style1"?>) or
comment (which looks like <!-- comment -->). (Patch by Neil Muller; bpo-2746.)
• The XML-RPC client and server, provided by the xmlrpclib and SimpleXMLRPCServer modules, have
improved performance by supporting HTTP/1.1 keep-alive and by optionally using gzip encoding to
compress the XML being exchanged. The gzip compression is controlled by the encode_threshold
attribute of SimpleXMLRPCRequestHandler, which contains a size in bytes; responses larger than this
will be compressed. (Contributed by Kristján Valur Jónsson; bpo-6267.)
• The zipfile module’s ZipFile now supports the context management protocol, so you can write with
zipfile.ZipFile(...) as f:. (Contributed by Brian Curtin; bpo-5511.)
zipfile now also supports archiving empty directories and extracts them correctly. (Fixed by
Kuba Wieczorek; bpo-4710.) Reading files out of an archive is faster, and interleaving read() and
readline() now works correctly. (Contributed by Nir Aides; bpo-7610.)
The is_zipfile() function now accepts a file object, in addition to the path names accepted in earlier
versions. (Contributed by Gabriel Genellina; bpo-4756.)
The writestr() method now has an optional compress_type parameter that lets you override the
default compression method specified in the ZipFile constructor. (Contributed by Ronald Oussoren;
bpo-6003.)
Consult the unittest module documentation for more details. (Developed in bpo-6001.)
The main() function supports some other new options:
• -b or --buffer will buffer the standard output and standard error streams during each test. If the
test passes, any resulting output will be discarded; on failure, the buffered output will be displayed.
• -c or --catch will cause the control-C interrupt to be handled more gracefully. Instead of interrupting
the test process immediately, the currently running test will be completed and then the partial results
up to the interruption will be reported. If you’re impatient, a second press of control-C will cause an
immediate interruption.
This control-C handler tries to avoid causing problems when the code being tested or the tests being
run have defined a signal handler of their own, by noticing that a signal handler was already set and
calling it. If this doesn’t work for you, there’s a removeHandler() decorator that can be used to mark
tests that should have the control-C handling disabled.
• -f or --failfast makes test execution stop immediately when a test fails instead of continuing to
execute further tests. (Suggested by Cliff Dyer and implemented by Michael Foord; bpo-8074.)
The progress messages now show ‘x’ for expected failures and ‘u’ for unexpected successes when run in
verbose mode. (Contributed by Benjamin Peterson.)
Test cases can raise the SkipTest exception to skip a test (bpo-1034053).
The error messages for assertEqual(), assertTrue(), and assertFalse() failures now provide more in-
formation. If you set the longMessage attribute of your TestCase classes to true, both the standard error
message and any additional message you provide will be printed for failures. (Added by Michael Foord;
bpo-5663.)
The assertRaises() method now returns a context handler when called without providing a callable object
to run. For example, you can write this:
with self.assertRaises(KeyError):
{}['foo']
p = ET.XMLParser(encoding='utf-8')
t = ET.XML("""<root/>""", parser=p)
Errors in parsing XML now raise a ParseError exception, whose instances have a position attribute
containing a (line, column) tuple giving the location of the problem.
• ElementTree’s code for converting trees to a string has been significantly reworked, making it roughly
twice as fast in many cases. The ElementTree.write() and Element.write() methods now have a
method parameter that can be “xml” (the default), “html”, or “text”. HTML mode will output empty
elements as <empty></empty> instead of <empty/>, and text mode will skip over elements and only
output the text chunks. If you set the tag attribute of an element to None but leave its children in
place, the element will be omitted when the tree is written out, so you don’t need to do more extensive
rearrangement to remove a single element.
Namespace handling has also been improved. All xmlns:<whatever> declarations are now output
on the root element, not scattered throughout the resulting XML. You can set the default names-
pace for a tree by setting the default_namespace attribute and can register new prefixes with
register_namespace(). In XML mode, you can use the true/false xml_declaration parameter to
suppress the XML declaration.
• New Element method: extend() appends the items from a sequence to the element’s children. Ele-
ments themselves behave like sequences, so it’s easy to move children from one element to another:
t = ET.XML("""<list>
<item>1</item> <item>2</item> <item>3</item>
</list>""")
new = ET.XML('<root/>')
new.extend(t)
# Outputs <root><item>1</item>...</root>
print ET.tostring(new)
• New Element method: iter() yields the children of the element as a generator. It’s also possible to
write for child in elem: to loop over an element’s children. The existing method getiterator()
is now deprecated, as is getchildren() which constructs and returns a list of children.
• New Element method: itertext() yields all chunks of text that are descendants of the element. For
example:
t = ET.XML("""<list>
<item>1</item> <item>2</item> <item>3</item>
</list>""")
# Outputs ['\n ', '1', ' ', '2', ' ', '3', '\n']
print list(t.itertext())
• Deprecated: using an element as a Boolean (i.e., if elem:) would return true if the element had any
children, or false if there were no children. This behaviour is confusing – None is false, but so is a
childless element? – so it will now trigger a FutureWarning. In your code, you should be explicit:
write len(elem) != 0 if you’re interested in the number of children, or elem is not None.
Fredrik Lundh develops ElementTree and produced the 1.3 version; you can read his article describing
1.3 at http://effbot.org/zone/elementtree-13-intro.htm. Florent Xicluna updated the version included with
Python, after discussions on python-dev and in bpo-6472.)
12.1 Capsules
Python 3.1 adds a new C datatype, PyCapsule, for providing a C API to an extension module. A capsule is
essentially the holder of a C void * pointer, and is made available as a module attribute; for example, the
socket module’s API is exposed as socket.CAPI, and unicodedata exposes ucnhash_CAPI. Other extensions
can import the module, access its dictionary to get the capsule object, and then get the void * pointer,
which will usually point to an array of pointers to the module’s various API functions.
There is an existing data type already used for this, PyCObject, but it doesn’t provide type safety. Evil
code written in pure Python could cause a segmentation fault by taking a PyCObject from module A and
somehow substituting it for the PyCObject in module B. Capsules know their own name, and getting the
pointer requires providing the name:
void *vtable;
if (!PyCapsule_IsValid(capsule, "mymodule.CAPI") {
PyErr_SetString(PyExc_ValueError, "argument type invalid");
return NULL;
}
(Python 2.7 actually produces slightly different output, since it returns a named tuple instead of a
standard tuple.)
For C extensions:
• C extensions that use integer format codes with the PyArg_Parse* family of functions will now raise
a TypeError exception instead of triggering a DeprecationWarning (bpo-5080).
• Use the new PyOS_string_to_double() function instead of the old PyOS_ascii_strtod() and
PyOS_ascii_atof() functions, which are now deprecated.
For applications that embed Python:
• The PySys_SetArgvEx() function was added, letting applications close a security hole when the existing
PySys_SetArgv() function was used. Check whether you’re calling PySys_SetArgv() and carefully
consider whether the application should be using PySys_SetArgvEx() with updatepath set to false.
The new ensurepip module (defined in PEP 453) provides a standard cross-platform mechanism to boot-
strap the pip installer into Python installations. The version of pip included with Python 2.7.9 is pip 1.5.6,
and future 2.7.x maintenance releases will update the bundled version to the latest version of pip that is
available at the time of creating the release candidate.
By default, the commands pip, pipX and pipX.Y will be installed on all platforms (where X.Y stands for
the version of the Python installation), along with the pip Python package and its dependencies.
For CPython source builds on POSIX systems, the make install and make altinstall commands do
not bootstrap pip by default. This behaviour can be controlled through configure options, and overridden
through Makefile options.
On Windows and Mac OS X, the CPython installers now default to installing pip along with CPython itself
(users may opt out of installing it during the installation process). Window users will need to opt in to the
automatic PATH modifications to have pip available from the command line by default, otherwise it can still
be accessed through the Python launcher for Windows as py -m pip.
As discussed in the PEP, platform packagers may choose not to install these commands by default, as long
as, when invoked, they provide clear and simple directions on how to install them on that platform (usually
using the system package manager).
Documentation Changes
As part of this change, the installing-index and distributing-index sections of the documentation have been
completely redesigned as short getting started and FAQ documents. Most packaging documentation has
now been moved out to the Python Packaging Authority maintained Python Packaging User Guide and the
documentation of the individual projects.
However, as this migration is currently still incomplete, the legacy versions of those guides remaining available
as install-index and distutils-index.
See also:
PEP 453 – Explicit bootstrapping of pip in Python installations PEP written by Donald Stufft
and Nick Coghlan, implemented by Donald Stufft, Nick Coghlan, Martin von Löwis and Ned Deily.
15.5 PEP 476: Enabling certificate verification by default for stdlib http clients
PEP 476 updated httplib and modules which use it, such as urllib2 and xmlrpclib, to now verify that
the server presents a certificate which is signed by a Certificate Authority in the platform trust store and
whose hostname matches the hostname being requested by default, significantly improving security for many
applications. This change was made in the Python 2.7.9 release.
For applications which require the old previous behavior, they can pass an alternate context:
import urllib2
import ssl
# This allows using a specific certificate for the host, which doesn't need
# to be in the trust store
context = ssl.create_default_context(cafile="/path/to/file.crt")
urllib2.urlopen("https://invalid-cert", context=context)
15.6 PEP 493: HTTPS verification migration tools for Python 2.7
PEP 493 provides additional migration tools to support a more incremental infrastructure upgrade process
for environments containing applications and services relying on the historically permissive processing of
server certificates when establishing client HTTPS connections. These additions were made in the Python
2.7.12 release.
These tools are intended for use in cases where affected applications and services can’t be modified to
explicitly pass a more permissive SSL context when establishing the connection.
For applications and services which can’t be modified at all, the new PYTHONHTTPSVERIFY environment
variable may be set to 0 to revert an entire Python process back to the default permissive behaviour of
Python 2.7.8 and earlier.
For cases where the connection establishment code can’t be modified, but the overall application can be, the
new ssl._https_verify_certificates() function can be used to adjust the default behaviour at runtime.
L
LDCXXSHARED, 29
P
Python Enhancement Proposals
PEP 3106, 9
PEP 3137, 10
PEP 372, 5
PEP 373, 2
PEP 378, 6
PEP 389, 7
PEP 391, 9
PEP 434, 33
PEP 453, 34
PEP 466, 33
PEP 476, 34
PEP 477, 34
PEP 493, 35
PYTHONSHOWALLOCCOUNT, 33
PYTHONSHOWREFCOUNT, 33
PYTHONWARNINGS, 3, 13
R
RFC
RFC 2732, 22
RFC 3986, 22, 32
U
USER_BASE, 20
37