Reading an OpenOffice spreadsheet into Python

Sometimes the best way to manage simulation output data is in a spreadsheet. Or the data you want to use is already in a spreadsheet.

Before reading it into Python for numerical analysis or plotting, or reading into xmgrace to plot, etc, it is normal to export the spreadsheet to a CSV or tab-separated file. But then you have two files. And if you do something in the spreadsheet you have to export it again and this is inefficient.

Here is a way to read an .ods file into a Python array:

  1. Install ODFPY. If you are running Ubuntu this can be found in the official software repository under the package name “python-odf”.
  2. Get the ODSReader script by Marco Conti . But wait! There is a patched version here that has some additional benefits, such as the ability to include empty cells.

To use it you will need to import the ODSReader class from ODSReader.py. Here is an example by myself where I cut off some unnecessary empty lines at the end:

#!/usr/bin/python

# Read an openoffice spreadsheet into a python array. 
# Uses python-odf a.k.a. ODFpy from Ubuntu repository.
# Uses ODSReader.py by Marko Conti (Apach License 2.0)
# Perform operations on the data using numpy

import sys
import numpy as np
sys.path.append("/mnt/FILES/WORK/scripts/ODSReader")
from ODSReader_simon import ODSReader as ods

def ReadODS(inFile, sheetName, cutEmpty=True):
    """
    Returns a given sheet from a given .ods file into a python array (list of lists). 
    """
    a = ods(inFile)
    sheet = a.getSheet(sheetName)
    # cut out empty lines for precision:
    if cutEmpty:
        removeList = []
        for index, line in enumerate(sheet):
            # If any of the items are filled (true) then don' add the line to the remove list. 
            b = False
            for item in line:
                if item:
                    b = True
            if not b:
                removeList.append(index)
        # Now we know which lines are empty, we can cut them out. 
        # Do it in reverse order because otherwise you will cut the wrong indices out. 
        removeList.reverse()
        for i in removeList:
            del sheet[i]
    return sheet
a = np.array(ReadODS("file_name.ods", "sheet_name", cutEmpty=True))

The numpy array now contains an array of strings so I modified the OSDReader.py script to eval() each cell item before returning the array. Eval() is safe if you are using your own data but can in some cases be exploited for malicious purposes; there is a lot of discussion about this online.