Monthly archives: February, 2014

Reading an OpenOffice spreadsheet into Python

Sometimes the best way to manage simulation output data is in a spreadsheet. Or the data you want to use is already in a spreadsheet.

Before reading it into Python for numerical analysis or plotting, or reading into xmgrace to plot, etc, it is normal to export the spreadsheet to a CSV or tab-separated file. But then you have two files. And if you do something in the spreadsheet you have to export it again and this is inefficient.

Here is a way to read an .ods file into a Python array:

  1. Install ODFPY. If you are running Ubuntu this can be found in the official software repository under the package name “python-odf”.
  2. Get the ODSReader script by Marco Conti . But wait! There is a patched version here that has some additional benefits, such as the ability to include empty cells.

To use it you will need to import the ODSReader class from ODSReader.py. Here is an example by myself where I cut off some unnecessary empty lines at the end:

#!/usr/bin/python

# Read an openoffice spreadsheet into a python array. 
# Uses python-odf a.k.a. ODFpy from Ubuntu repository.
# Uses ODSReader.py by Marko Conti (Apach License 2.0)
# Perform operations on the data using numpy

import sys
import numpy as np
sys.path.append("/mnt/FILES/WORK/scripts/ODSReader")
from ODSReader_simon import ODSReader as ods

def ReadODS(inFile, sheetName, cutEmpty=True):
    """
    Returns a given sheet from a given .ods file into a python array (list of lists). 
    """
    a = ods(inFile)
    sheet = a.getSheet(sheetName)
    # cut out empty lines for precision:
    if cutEmpty:
        removeList = []
        for index, line in enumerate(sheet):
            # If any of the items are filled (true) then don' add the line to the remove list. 
            b = False
            for item in line:
                if item:
                    b = True
            if not b:
                removeList.append(index)
        # Now we know which lines are empty, we can cut them out. 
        # Do it in reverse order because otherwise you will cut the wrong indices out. 
        removeList.reverse()
        for i in removeList:
            del sheet[i]
    return sheet
a = np.array(ReadODS("file_name.ods", "sheet_name", cutEmpty=True))

The numpy array now contains an array of strings so I modified the OSDReader.py script to eval() each cell item before returning the array. Eval() is safe if you are using your own data but can in some cases be exploited for malicious purposes; there is a lot of discussion about this online.




LaTeX overfull hbox errors (and how I fixed mine)

If you have ever written a journal article or thesis using LaTeX then you probably came across lots of errors and warnings in the process. Those warnings can usually be ignored as they don’t stop the document from compiling, and many of us who just want to cross the finishing line probably never investigate what was causing them.

One such common warning is the Overfull or Underfull /hbox message.

These usually occur through no fault of the author, but because LaTeX doesn’t know how to hy-ph-enate certain words. You see, LaTeX tries to typeset in a strict column width and has a dictionary of words that includes hyphenation information that tells it where the most logical places to split a word up are, with some order of priority. When it wants to flow text at the end of a line it can and does automatically hyphenate (usually longer) words to keep the character spacing constant while adhering to the strict column width.

What I have found is that certain words I was using were not in the LaTeX list of words and the /hbox problems were caused by LaTeX not knowing how to hyphenate them. Thus the column widths were being messed up. Such special words for me were micromagnetics, intergranular and so on.

By supplying forced hyphenation marks manually, e.g. micro\-mag\-net\-ics I was able to get rid of all the warnings.