Working With Excel Spread Sheets
Working With Excel Spread Sheets
Python does not come with OpenPyXL, so you’ll have to install it. Follow the
instructions for installing third-party modules in Appendix A; the name of the
module is openpyxl. To test whether it is installed correctly, enter the following
into the interactive shell:
If the module was correctly installed, this should produce no error messages.
Remember to import the openpyxl module before running the interactive shell
examples in this chapter, or you’ll get a NameError: name 'openpyxl' is not
defined error.
This book covers version 2.3.3 of OpenPyXL, but new versions are regularly
released by the OpenPyXL team. Don’t worry, though: New versions should stay
backward compatible with the instructions in this book for quite some time. If you
have a newer version and want to see what additional features may be available
to you, you can check out the full documentation for OpenPyXL
at http://openpyxl.readthedocs.org/.
Figure 12-1. The tabs for a workbook’s sheets are in the lower-left corner of Excel.
Sheet 1 in the example file should look like Table 12-1. (If you didn’t
download example.xlsx from the website, you should enter this data into the
sheet yourself.)
Table 12-1. The example.xlsx Spreadsheet
A B C
1 4/5/2015 1:34:02 PM Apples 73
2 4/5/2015 3:41:23 AM Cherries 85
3 4/6/2015 12:46:51 PM Pears 14
4 4/8/2015 8:59:43 AM Oranges 52
5 4/10/2015 2:07:00 AM Apples 152
6 4/10/2015 6:10:37 PM Bananas 23
7 4/10/2015 2:40:46 AM Strawberries 98
Now that we have our example spreadsheet, let’s see how we can manipulate it
with the openpyxl module.
You can get a list of all the sheet names in the workbook by calling
the get_sheet_names() method. Enter the following into the interactive shell:
Once you have a Worksheet object, you can access a Cell object by its name.
Enter the following into the interactive shell:
OpenPyXL will automatically interpret the dates in column A and return them
as datetime values rather than strings. The datetime data type is explained
further in Chapter 16.
1 Apples
3 Pears
5 Apples
7 Strawberries
Say you want to go down column B and print the value in every cell with an odd
row number. By passing 2 for the range() function’s “step” parameter, you can get
cells from every second row (in this case, all the odd-numbered rows).
The for loop’s i variable is passed for the row keyword argument to
the cell() method, while 2 is always passed for the column keyword argument.
Note that the integer 2, not the string 'B', is passed.
Note that the max_column method returns an integer rather than the letter that
appears in Excel.
After you import these two functions from the openpyxl.cell module, you can
call get_column_letter() and pass it an integer like 27 to figure out what the letter
name of the 27th column is. The function column_index_string() does the reverse:
You pass it the letter name of a column, and it tells you what number that column
is. You don’t need to have a workbook loaded to use these functions. If you want,
you can load a workbook, get a Worksheet object, and call a Worksheet object
method like max_column to get an integer. Then, you can pass that integer
to get_column_letter().
To print the values of each cell in the area, we use two for loops. The
outer for loop goes over each row in the slice ❶. Then, for each row, the
nested for loop goes through each cell in that row ❷.
To access the values of cells in a particular row or column, you can also use
a Worksheet object’s rows and columns attribute. Enter the following into the
interactive shell:
Apples
Cherries
Pears
Oranges
Apples
Bananas
Strawberries
As a quick review, here’s a rundown of all the functions, methods, and data types
involved in reading a cell out of a spreadsheet file:
1. Import the openpyxl module.
2. Call the openpyxl.load_workbook() function.
3. Get a Workbook object.
4. Read the active member variable or call
the get_sheet_by_name() workbook method.
5. Get a Worksheet object.
6. Use indexing or the cell() sheet method with row and column keyword
arguments.
7. Get a Cell object.
8. Read the Cell object’s value attribute.