Unit 5
Unit 5
5.1 Files
File is a named location on disk to store related information. It is used to permanently store
data in a non-volatile memory (e.g. hard disk). Since, random access memory (RAM) is
volatile which loses its data when computer is turned off, we use files for future use of the
data.
In Windows, for example, a file can be any item manipulated, edited or created by the
user/OS. That means files can be images, text documents, executables, and much more. Most
files are organized by keeping them in individual folders.
In Python, a file is categorized as either text or binary, and the difference between the two file
types is important.
Text files are structured as a sequence of lines, where each line includes a sequence of
characters. Each line is terminated with a special character, called the EOL or End of Line
character. There are several types, but the most common is the comma {,} or newline
character. It ends the current line and tells the interpreter a new one has begun.
A backslash character can also be used, and it tells the interpreter that the next character –
following the slash – should be treated as a new line. This character is useful when you don’t
want to start a new line in the text itself but in the code.
A binary file is any type of file that is not a text file. Because of their nature, binary files can
only be processed by an application that know or understand the file’s structure. In other
words, they must be applications that can read and interpret binary.
When we want to read from or write to a file we need to open it first. When we are done, it
needs to be closed, so that resources that are tied with the file are freed.
1. Open a file
2. Read or write (perform operation)
3. Close the file
Opening a file
Python has a built-in function open() to open a file. This function returns a file object, also
called a handle, as it is used to read or modify the file accordingly.
We can specify the mode while opening a file. In mode, we specify whether we want to read
'r', write 'w' or append 'a' to the file. We also specify if we want to open the file in text mode
or binary mode.
If the file already exists, opening it in write mode clears out the old data and starts fresh, so
be careful! If the file doesn’t exist, a new one is created.
The default is reading in text mode. In this mode, we get strings when reading from the file.
On the other hand, binary mode returns bytes and this is the mode to be used when dealing
with non-text files like image or exe files.
Closing a File
When we are done with operations to the file, we need to properly close it. Closing a file will
free up the resources that were tied with the file and is done using the close() method. Python
has a garbage collector to clean up unreferenced objects but, we must not rely on it to close
the file.
>>>f = open("test.txt")
# perform file operations
>>>f.close()
The above method is not entirely safe. If an exception occurs when we are performing some
operation with the file, the code exits without closing the file. A safer way is to use a
try...finally block.
>>>try:
>>> f = open("test.txt",encoding = 'utf-8')
# perform file operations
>>>finally:
>>> f.close()
This way, we are guaranteed that the file is properly closed even if an exception is raised,
causing program flow to stop.
Writing to a File
In order to write into a file we need to open it in write 'w', append 'a' or exclusive creation 'x'
mode. We need to be careful with the 'w' mode as it will overwrite into the file if it already
exists. All previous data are erased.
Writing a string or sequence of bytes (for binary files) is done using write() method. This
method returns the number of characters written to the file.
This program will create a new file named 'test.txt' if it does not exist. If it does exist, it is
overwritten. We must include the newline characters ourselves to distinguish different lines ‘\
n’.
To read the content of a file, we must open the file in reading mode. There are various
methods available for this purpose. We can use the read(size) method to read in size number
of data. If size parameter is not specified, it reads and returns up to the end of the file.
>>> f = open("test.txt",'r’)
>>> f.read(4) # read the first 4 data 'This'
>>> f.read(4) # read the next 4 data ' her'
>>> f.read() # read in the rest till end of file ‘e's the wattle,\n the emblem of our land.’
>>> f.read() # further reading returns empty sting ''
We can see, that read () method returns newline as '\n'. Once the end of file is reached, we get
empty string on further reading.
The argument of write has to be a string, so if we want to put other values in a file, we have
to convert them to strings. The easiest way to do that is with str:
>>> x = 25
>>> fout.write(str(x))
An alternative is to use the format operator, %. When applied to integers, % is the modulus
operator. But when the first operand is a string, % is the format operator. The first operand is
the format string, which contains one or more format sequences, which specify how the
second operand is formatted. The result is a string.
For example, the format sequence '%d' means that the second operand should be formatted
as an integer (d stands for “decimal”):
>>> lion = 32
>>> '%d' % lion
#output:'32'
The result is the string '32', which is not to be confused with the integer value 32. A format
sequence can appear anywhere in the string, so you can embed a value in a sentence.
>>> lion = 32
>>> 'I have spotted %d lion.' % lion
#output:'I have spotted 32 lion.'
If there is more than one format sequence in the string, the second argument has to be a tuple.
Each format sequence is matched with an element of the tuple, in order. The following
example uses '%d' to format an integer, '%g' to format a floating-point number (don’t ask
why), and '%s' to format a string.
The number of elements in the tuple has to match the number of format sequences in the
string. Also, the types of the elements have to match the format sequences.
The Python sys module provides access to any command-line arguments via the sys.argv.
What is sys.argv?
sys.argv is the list of commandline arguments passed to the Python program. argv
represents all the items that come along via the commandline input, it's basically a an array
holding the command line arguments of our program. Don't forget that the counting starts at
zero (0) not one (1).
#!/usr/bin/python
import sys
As mentioned above, first argument is always script name and it is also being counted in
number of arguments
# file: hypotenuse.py
x = float(sys.argv[1]) # Value 5
y = float(sys.argv[2]) #Value 12
Output:
>>> hypotenuse.py 5 12
Hypotenuse = 13.0
5.4 Exception Handling
There are (at least) two distinguishable kinds of errors: syntax errors and exceptions.
Syntax Errors
Syntax errors, also known as parsing errors, are perhaps the most common kind of complaint
you get while you are still learning Python.
The parser repeats the offending line and displays a little ‘arrow’ pointing at the earliest point
in the line where the error was detected. The error is caused by (or at least detected at) the
token preceding the arrow: in the example, the error is detected at the function print(), since a
colon (':') is missing before it. File name and line number are printed so you know where to
look in case the input came from a script.
Exceptions
What is Exception?
An exception is an event, which occurs during the execution of a program that disrupts the
normal flow of the program's instructions. In general, when a Python script encounters a
situation that it cannot cope with, it raises an exception. An exception is a Python object that
represents an error. When a Python script raises an exception, it must either handle the
exception immediately otherwise it terminates and quits.
If you have some suspicious code that may raise an exception, you can defend your program
by placing the suspicious code in a try: block. After the try: block, include an except:
statement, followed by a block of code which handles the problem as elegantly as possible.
Syntax:
try:
You do your operations here;
......................
except ExceptionI:
If there is ExceptionI, then execute this block.
except ExceptionII:
If there is ExceptionII, then execute this block.
......................
else:
If there is no exception then execute this block.
A single try statement can have multiple except statements. This is useful when the
try block contains statements that may throw different types of exceptions.
You can also provide a generic except clause, which handles any exception.
After the except clause(s), you can include an else-clause. The code in the else-block
executes if the code in the try: block does not raise an exception.
The else-block is a good place for code that does not need the try: block's protection.
Example
#!/usr/bin/python
try:
fh = open("testfile", "w")
fh.write("This is my test file for exception handling!!")
except IOError:
print "Error: can\'t find file or read data"
else:
print "Written content in the file successfully"
fh.close()
You can also use the except statement with no exceptions defined as follows.
Syntax:
try:
You do your operations here;
......................
except:
If there is any exception, then execute this block.
......................
else:
If there is no exception then execute this block.
This kind of a try-except statement catches all the exceptions that occur. Using this kind of
try-except statement is not considered a good programming practice though, because it
catches all exceptions but does not make the programmer identify the root cause of the
problem that may occur.
The except Clause with Multiple Exceptions
You can also use the same except statement to handle multiple exceptions as follows.
Syntax:
try:
You do your operations here;
......................
except(Exception1[, Exception2[,...ExceptionN]]]):
If there is any exception from the given exception list,
then execute this block.
......................
else:
If there is no exception then execute this block.
You can use a finally: block along with a try: block. The finally block is a place to put any
code that must execute, whether the try-block raised an exception or not. The syntax of the
try-finally statement is this.
Syntax:
try:
You do your operations here;
......................
Due to any exception, this may be skipped.
finally:
This would always be executed.
......................
You cannot use else clause as well along with a finally clause.
Example
#!/usr/bin/python
try:
fh = open("testfile", "w")
try:
fh.write("This is my test file for exception handling!!")
finally:
print "Going to close the file"
fh.close()
except IOError:
print "Error: can\'t find file or read data"
When an exception is thrown in the try block, the execution immediately passes to the finally
block. After all the statements in the finally block are executed, the exception is raised again
and is handled in the except statements if present in the next higher layer of the try-except
statement.
Argument of an Exception
An exception can have an argument, which is a value that gives additional information about
the problem. The contents of the argument vary by exception. You capture an exception's
argument by supplying a variable in the except clause as follows –
Syntax:
try:
You do your operations here;
......................
except ExceptionType, Argument:
You can print value of Argument here...
If you write the code to handle a single exception, you can have a variable follow the name of
the exception in the except statement. If you are trapping multiple exceptions, you can have a
variable follow the tuple of the exception.
This variable receives the value of the exception mostly containing the cause of the
exception. The variable can receive a single value or multiple values in the form of a tuple.
This tuple usually contains the error string, the error number, and an error location.
Example
#!/usr/bin/python
5.5 Modules
If you quit from the Python interpreter and enter it again, the definitions you have made
(functions and variables) are lost. Therefore, if you want to write a somewhat longer
program, you are better off using a text editor to prepare the input for the interpreter and
running it with that file as input instead. This is known as creating a script. As your program
gets longer, you may want to split it into several files for easier maintenance. You may also
want to use a handy function that you’ve written in several programs without copying its
definition into each program.
To support this, Python has a way to put definitions in a file and use them in a script or in an
interactive instance of the interpreter. Such a file is called a module; definitions from a
module can be imported into other modules or into the main module (the collection of
variables that you have access to in a script executed at the top level and in calculator mode).
A module is a file containing Python definitions and statements. The file name is the module
name with the suffix .py appended. Within a module, the module’s name (as a string) is
available as the value of the global variable __name__. For instance, use your favorite text
editor to create a file called fibo.py in the current directory with the following contents.
Example
Now enter the Python interpreter and import this module with the following command:
This does not enter the names of the functions defined in fibo directly in the current symbol
table; it only enters the module name fibo there. Using the module name you can access the
functions.
>>> fibo.fib(1000)
#output: 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
>>> fibo.fib2(100)
#output: [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
>>> fibo.__name__
#output:'fibo'
If you intend to use a function often you can assign it to a local name.
>>> fib = fibo.fib
>>> fib(500)
#output:1 1 2 3 5 8 13 21 34 55 89 144 233 377
When a module named spam is imported, the interpreter first searches for a built-in module
with that name. If not found, it then searches for a file named spam.py in a list of directories
given by the variable sys.path. sys.path is initialized from these locations.
The directory containing the input script (or the current directory).
PYTHONPATH (a list of directory names, with the same syntax as the shell variable
PATH).
The installation-dependent default.
After initialization, Python programs can modify sys.path. The directory containing the
script being run is placed at the beginning of the search path, ahead of the standard library
path. This means that scripts in that directory will be loaded instead of modules of the same
name in the library directory. This is an error unless the replacement is intended.
5.6 Packages
Packages are namespaces which contain multiple packages and modules themselves. They
are simply directories, but with a twist. Each package in python is a directory which must
contain a special file called __init__.py. This file can be empty, and it indicates that the
directory it contains is a python package, so it can be imported the same way a module can be
imported.
Consider a file Pots.py available in Phone directory. This file has following line of source
code.
#!/usr/bin/python
def Pots():
print "I'm Pots Phone"
Similar way, we have another two files having different functions with the same name as
above.
Phone/__init__.py
To make all of your functions available when you've imported Phone, you need to put
explicit import statements in __init__.py as follows.
from Pots import Pots
from Isdn import Isdn
from G3 import G3
After you add these lines to __init__.py, you have all of these classes available when you
import the Phone package.
#!/usr/bin/python
>>>Phone.Pots()
>>>Phone.Isdn()
>>>Phone.G3()
In the above example, we have taken example of a single functions in each file, but you can
keep multiple functions in your files. You can also define different Python classes in those
files and then you can create your packages out of those classes.
Problem Solution
Source Code
Here is source code of the Python Program to count the number of words in a text file. The
program output is also shown below.
Program Explanation
1. User must enter a file name.
2. The file is opened using the open() function in the read mode.
3. A for loop is used to read through each line in the file.
4. Each line is split into a list of words using split().
5. The number of words in each line is counted using len() and the count variable is
incremented.
6. The number of words in the file is printed.
Problem Solution
Source Code
Here is source code of the Python Program to copy the contents of one file into another. The
program output is also shown below.
Program Explanation
1. The source file is opened using the open() function using the fin stream.
2. Destination file is opened using the open() function in the write mode using the fout
stream.
3. Each line in the file is iterated over using a for loop (in the input stream).
4. Each of the iterated lines is written into the output file.