Adding OCR layers to your Zotero library PDF items for Metadata extraction and indexing

Zotero is a cross-platform literature manager that is able to sync to a remote server and across multiple user devices. There are many alternatives available, each with strengths and weaknesses, but I am currently using Zotero to manage my literature because it is free and works with WebDAV for additional free storage. In this article I will describe why optical character recognition (OCR) is important for Zotero and suggest a way to add OCR to existing items in a Zotero library. However, the method actually works for any collection of PDF files on your computer! ...

November 30, 2018 · 8 min · Simon

ASCII plotting on the command line terminal with eplot

If you want to plot something on the terminal in ASCII you can use “eplot”. eplot itself is a Ruby script that acts as a frontend for gnuplot. eplot can be downloaded from the project’s GitHub page. It makes it easier to pipe numbers into gnuplot, which can otherwise be a bit of a hassle. It also has a dumb terminal mode which allows us to plot using ASCII. Plotting like this provides a way to quickly check data files without requiring any x windowing system, which might not be available when logging in remotely over the terminal. ...

August 8, 2018 · 1 min · Simon

Fetching, wrangling and visualising sunrise and sunset data using Python

Previously I showed how it was possible to obtain sunrise and sunset times for a whole year at any location on Earth, from a public source. This time I am going to explain how to fetch that data, clean it up and create graphical visualizations like the one below, all using Python. A Jupyter Notebook is available on GitHub. Such data might even be useful in, for example, simulation of solar power generation. ...

February 2, 2018 · 11 min · Simon

How to get Sunrise and Sunset times for data analysis

jinsngjung / Pixabay Today is 2nd January and the days are getting longer again. I was thinking about sunrise and sunset times and wondering if I should do some data analysis and plotting of these to visualize how they change over the course of a year. Of course, the first step is to get a suitable data set. You can get a data set of sunrise and sunset times from this page on the US Naval Observatory (USNO) website. Sunrise and sunset times depend on your specific location on the globe so you have to specify the location for your data set. Using the “Form B” Locations Worldwide" section on the page it is possible to enter precise coordinates for any location in terms of latitude and longitude. You can also do it for a timezone, but this is very broad and not precise enough for my liking. ...

January 2, 2018 · 2 min · Simon

Configuring KNIME to work with Python 2.7.x on Windows

UPDATE: These days it is recommended to use Python3 instead of Python2 Apparently it is tricky to get Python integration working in the KNIME Analytics Platform. If you read the official guide too quickly you can miss some critical information at the bottom of the page. I was getting an error complaining that the google.protobuf library was missing even though I thought that I had everything installed correctly: Library google.protobuf is missing, required minimum version is 2.5.0 ...

August 23, 2017 · 3 min · Simon

Successfully clearing ports in Salome (Code ASTER)

Figure: Building a geometry in the Salome graphical user interface (GUI). How Salome tracks ports When Salome is starting up, it checks for free ports on your system using a few built-in Python scripts. Then when you close Salome those ports should be freed up again for the next one. This has a number of uses, but one reason is to stop multiple instances of Salome trying to use the same port at once. ...

June 1, 2017 · 7 min · Simon

How to get up-to-date Python packages without bothering your cluster admin

If you have ever been stuck as a user on an out-of-date cluster without root access it can be frustrating to ask the admin guy to install packages for you. Even if they respond, by the time they get round to it you might have moved onto something else. The moment could be gone. Luckily, as far as Python is concerned, the pyenv project allows users to install their own local Python version or even assign different versions to different directories/projects. ...

September 1, 2016 · 1 min · Simon

Plotting multivariate data with Matplotlib/Pylab: Edgar Anderson’s Iris flower data set

The problem of how to visualize multivariate data sets is something I often face in my work. When using numerical optimization we might have a single objective function and multiple design variables that can be represented by columnar data in the form {x1, x2, x3, … xn, y} a.k.a. NXY. With design spaces of more than a few dimensions it is difficult to visualize them in order to estimate the relationship between each independent variable and the objective, or perform a sensitivity study. ...

August 31, 2016 · 4 min · Simon

The new default colormap for matplotlib is called “viridis” and it’s great!

As is known by anyone in the field of data visualization, the “jet” colormap has some flaws: Doesn’t work when printed black & white Doesn’t work well for colourblind people Not linear in colour space, so it’s hard to estimate numerical values from the resulting image The Matlab team recently developed a new colormap called “parula” but since Matlab is commercially-licensed software, it’s use is restricted. The Matplotlib team have therefore developed their own version, based on the principles of colour theory (covered in my own BSc lecture courses on Visualization). The new Matplotlib default colormap is named “viridis” and it will become the new default colour map starting with Matplotlib v2.0. Users of older versions v1.5.1 can still choose viridis manually using cmap=plt.cm.viridis. ...

April 6, 2016 · 1 min · Simon

5 Tips for making finite element models with Salome

Salome is an open source software package used to create geometric models and finite element meshes for use in numerical simulations. It is also able to perform its own numerical simulations and has post-processing capabilities built in. Here are my 5 tips for anyone who is interested in using Salome for model and mesh creation. 1. Practice manually first This goes without saying. Although Salome has a powerful Python-based scripting capability, it is worth practicing with manual model generation. By that I mean, clicking with your mouse in the GUI. Manual practice lets you get familiar with the quirks of the Salome workflow, which has a different mentality to many other model generator programs. ...

August 15, 2015 · 3 min · Simon

Firefox search bar – setting the region for Google searches

The problem If you are visiting/living abroad but still want the Firefox search bar to default to your home version of Google, it is possible to fix it! In the following solution I assume that you are from the UK and want to use the UK/GB version of Google Search: The solution Access the Firefox settings: type about:config in the address bar, then click I'll be careful, I promise. For each setting you want to modify, use the search bar to find it more quickly. Set both browser.search.countryCode and browser.search.region to “GB” by double-clicking it and typing the new value. Make sure browser.search.isUS is set to False. For both distribution.searchplugins.defaultLocale and general.useragent.locale use “en-GB”. Restart Firefox. Visit http://www.google.com/ncr, to activate a no country redirect cookie that stops the browser from looking for country-specific search results. Searching from the search bar should now give results at Google.com, but notice that at the very end of the search results URL there is a flag for UK regional content. ...

April 13, 2015 · 1 min · Simon

Cool code: plotting columns from many data files with Grace

Grace a.k.a. xmgrace is a really useful tool for plotting histograms from tabular data files. Its power comes from the command line control and being scriptable. Yes, there are other options which are sometimes more suitable for specific situations (e.g. GNUplot, Matplotlib/PyLab), but for quick, basic plotting I usually find myself relying on xmgrace. Here is an example of a single line command to plot two columns from each of a large number of data files: ...

November 27, 2014 · 2 min · Simon

Aligning qhost output on the commandline when hostnames are too damn long

qhost is a UNIX command line tool to print the status of nodes on a Grid Engine system. The output is normally quite readable and is sorted by columns to give information on the hostname (“HOSTNAME”), architecture (“ARCH”), no. of CPUs (“NCPU”), processor load (“LOAD”), total available memory (“MEMTOT”), current memory usage (“MEMUSE”), swap memory size (“SWAPTO”) and current swap usage (“SWAPUS”) of each node on the cluster. Unfortunately, when the hostnames are too long, instead of truncating them to keep the columns aligned the row gets shunted along, making the output messy and much harder to read quickly. ...

September 24, 2014 · 2 min · Simon

ZotFile for syncing PDF articles from Zotero to my eReader

I use Zotero to manage my literature collection, including all the associated PDF attachments. It really made my life easier when I set up the WebDAV file sync on Box. However, until now the only way to sync files to my Onyx Boox M96 eReader (image) was by connecting a USB cable and copying them manually to the device. Since Zotero stores the files in cryptically-named individual folders it is hard to do this manually in an organised manner and involves lots of clicking. Today I am going to find a better way. ...

September 17, 2014 · 5 min · Simon

Syncing Zotero files with WebDAV from Box

Its hard to stay organized when you work on multiple computers, with multiple operating systems. My main notebook is a dual boot Ubuntu/Win7 machine where I have a shared partition for work files. I sync my work folder with my Ubuntu tower PC via BitTorrent Sync. This has now been working well for some time (the syncing happens under Ubuntu only, which is a drawback, but if BTSync under Windows also tries to sync the same shared folder it causes problems, thus I avoided doing so) although if you start with two identical copies of the folder on the two PCs it still wants to sync all of the files one way over the network. Not good when the folder is 100 GB large! (Update: this may have been solved in the latest version of the software). ...

August 6, 2014 · 3 min · Simon

Salome reordering during scripted Explode function

When writing Salome scripts that include a step to explode objects to their sub-shapes (using “ExtractShapes”) it is worth paying attention to the isSorted parameter, which is True by default. In my experience this parameter is best set to False in order to avoid Salome unpredictably changing the order of the objects in the resulting list. For example, here I have a Compound of two objects A and B, which has the faces glued resulting in Glue_1. I want to extract the two solids and use them in the script. ...

November 29, 2013 · 1 min · Simon