Category: My work

Configuring KNIME to work with Python 2.7.x on Windows

Apparently it is tricky to get Python integration working in the KNIME Analytics Platform. If you read the official guide too quickly you can miss some critical information at the bottom of the page. I was getting an error complaining that the google.protobuf library was missing even though I thought that I had everything installed correctly:

Library google.protobuf is missing, required minimum version is 2.5.0

Here is the correct sequence of actions to fix this problem and get Python integration working in KNIME. It worked for me, so I hope it works for you too!

  1. If you don’t already have it on your system, download and install Anaconda Python. I use the Python 2.7 version of Anaconda. I recommend installing Anaconda in the folder “C:\Anaconda2”.
  2. Create a new conda environment by using the following code at the windows command prompt (or power shell):
    conda create -y -n py27_knime python=2.7 pandas jedi protobuf
  3. Create a new Windows batch file with the following contents:
    @REM Adapt the directory in the PATH to your system
    @SET PATH=C:\Anaconda2\Scripts;%PATH%
    @CALL activate py27_knime || ECHO Activating py27_knime failed
    @python %*

    Of course, you might have to change the path to your Anaconda2 installation if you installed it to a different path.¬†Save the batch file somewhere, like your user directory or the KNIME workspace folder. I named my file “py27.bat” and placed it in the knime-workspace folder.
    Just in case anyone reading this is confused…. a windows batch file is just a text file that is saved with the file extension “.bat”. You can create it in any text editor by creating a new empty text file, pasting the above four lines of text into it and saving the file as “py27.bat”.

  4. If you haven’t already, download and install KNIME Analytics Platform. At the time of writing, the latest version is 3.4.0, so that’s the one I used. You might as well get the installer with all the free extensions included, presuming you have enough disk space for it and your internet connection is decent enough. This will save having to install further packages later, although to be fair KNIME seems to do a fairly good job at installing packages on the fly (after asking you first) whenever you load a workspace that needs them.

    Downloading KNIME

  5. Start KNIME. Go to File>Preferences, then KNIME > Python scripting. In the “path to the local python executable” paste the full path to your batch file, e.g. “C:\Users\yourname\knime-workspace\py27.bat”. To be future-proofed, also do this for KNIME > Python (Labs).
    KNIME now calls our batch file instead of calling the system-wide Python executable directly. The batch file activates the “py27_knime” environment before calling Python. This makes sure that the additional libraries required by KNIME are available.
    I guess you could also get your usual Python environment working with KNIME by installing additional packages, but let’s just do what the knowledgeable guys at KNIME have suggested this time. ūüôā

    Python scripting preferences for KNIME Analytics Platform in Windows

  6. Now restart KNIME. Try opening or creating a workflow with a node that uses some Python code. It should work now!



New Paper: Role of twin and anti-phase defects in MnAl permanent magnets

Our latest paper has just been published in the journal Acta Materialia. In it, we compare finite element micromagnetics simulations to experimental evidence in order to investigate the role of twin and anti-phase defects in the reduction of performance in MnAl permanent magnets.

Room temperature (BH) max as a function of approximate raw material costs for
the theoretical MnAl permanent magnet and experimental values for a selection of common
commercial permanent magnets. The raw material costs are a good indication of the relative cost of the manufactured magnets.

The preprint version is available on arxiv.org.

 

Twin domain boundary nucleation field H nuc and de-pinning fields for left H depin,L
and right H depin,R initial positions as a function of twinning angle őł.

 



Plotting multivariate data with Matplotlib/Pylab: Edgar Anderson’s Iris flower data set

The problem of how to visualize multivariate data sets is something I often face in my work. When using numerical optimization we might have a single objective function and multiple design variables¬†that can be represented by columnar data in the form {x1, x2, x3, … xn, y} a.k.a. NXY. With design spaces of more than a few¬†dimensions it is difficult to visualize them in order to estimate the relationship between each independent variable and the objective, or perform a sensitivity study.

JensG / Pixabay

While perusing¬†recent work in and tools for visualizing such data I stumbled across some nice examples of multivariate data plotting using a famous data set known as the “Iris data set”,¬†also known as¬†Fisher’s Iris data set or¬†Edgar Anderson’s Iris flower data set. It contains data from 50 flowers each of three different flower species,¬†collected in the Gasp√© Peninsula.¬†This set is not in the¬†NXY form¬†typical of optimization routines, but instead each flower has a number of parameters measured and tabulated; namely sepal length, sepal width, petal length and petal width. In other words there is no resultant Y data that is a function of the design space vector. Instead, it is interesting¬†to plot relationships between the measured parameters to determine if they correlate with each other.

A quick internet search¬†brings up a¬†number of examples where the set has been plotted as a gridded set of subplots, using¬†various software tools. For example,¬†Mike Bostock’s blog¬†post demonstrating his D3.js package, and the¬†version on the wikipedia page.

I decided to try and code a¬†Matplotlib¬†script¬†to generate¬†a similar gridded multiplot from the data set. I did so within a Jupyter Notebook (formerly known as iPython Notebook) running Python 2.7. The data was imported using Pandas and made use of Matplotlib’s Pyplot module. Pandas was used to import the data but it could have been done in a number of different ways; it is just that Pandas is designed to work with csv files containing a mix of types.

The resulting image can be seen below.

Iris flower data set visualization using Matplotlib/pyplot.

Fisher’s Iris data set sometimes known as Anderson’s Iris data set, visualization by Simon Bance using Matplotlib/Pyplot. A multivariate data set introduced by Ronald Fisher in 1936 from data collected by Edgar Anderson on Iris flowers in the Gasp√© Peninsula.

Here is the script:


"""
https://en.wikipedia.org/wiki/Iris_flower_data_set
A script for plotting multivariate tabular data as gridded scatter plots.
"""
import os
import pandas as pd
import matplotlib.pyplot as plt

inFile = r'iris.dat'

# Check if data file exists:
if not os.path.exists(inFile): sys.exit("File %s does not exist" % inFile)

rootFolder = os.path.dirname(os.path.abspath(inFile))

# Read in the data file
df = pd.read_csv(inFile, delimiter="\t")
headers = list(df.columns.values)
df.head(5) # Prints first n lines to check if we loaded the data file as expected.

# We also have n=4 distinct species in the Species column and I will
# list the species names so we can distinguish them later for plotting:
species = list(df.Species.unique()) # normal python list, thank you very much!
print type(species)

# Here we specify how many columns prepend and append the columns that we want to use.
# For Dakota this would include the objective function(s) column(s) appended to the end.
num_precols = 0
num_obj_fn = 1

# Work out the number of dimensions in each design vector:
num_dims = df.shape[1] - num_obj_fn # We know that there are 3 additional columns (and hope that it stays consistent in future)!
print "Our design vector has %s dimensions: %s" % (num_dims, headers[num_precols:-1])
gridshape = (num_dims, num_dims)
num_plots = num_dims**2
print "Our multivariate grid will therefore be of shape", gridshape, "with a total of", num_plots, "plots"

# Plot the data in a grid of subplots.
fig = plt.figure(figsize=(12, 12))

# Iterate over the correct number of plots.
n = 1

# Create an empty 2D list to store created axes. This alows us to edit them somehow.
axes = [[False for i in range(num_dims)] for j in range(num_dims)]

for j in range(num_dims):
for i in range(num_dims):

# e.g. plt.subplot(nx, ny, plotnumber)
ax = fig.add_subplot(num_dims, num_dims, n) # Plot numbering in this case starts from 1 not zero (MATLAB style indexing)!

# Choose your list of colours
colors = ['red', 'green', 'blue']

for index, s in enumerate(species):

# x axis: For each in the species list look at all rows with that value in the Species column.
# Use the ith column of that subset as the x series.
# y axis: Likewisem, but use the jth column.

if i != j:
ax.scatter(df.where(df['Species'] == s).ix[:,i], df.where(df['Species'] == s).ix[:,j], color=colors[index], label=s)
else:
# Put the variable name on the i=j subplots:
ax.text(0.25, 0.5, headers[i])
pass

# Set axis labels:
ax.set_xlabel(headers[i])
ax.set_ylabel(headers[j])

# Hide axes for all but the plots on the edge:
if j < num_dims - 1: ax.xaxis.set_visible(False) if i > 0:
ax.yaxis.set_visible(False)

if i == 1 and j == 0:
ax.legend(bbox_to_anchor=(3.5, 1), loc=2, borderaxespad=0., title="Species name:")

# Add this axis to the list.
axes[j][i] = ax

n += 1

plt.subplots_adjust(left=0.1, right=0.85, top=0.85, bottom=0.1)

plt.savefig("%s/iris.png" % rootFolder, dpi=300)
plt.show()

Further so-called “classic data sets” are listed at¬†https://en.wikipedia.org/wiki/Data_set#Classic_data_sets.




[PDF] “Grain-size dependent demagnetizing factors in permanent magnets” reprint update

The reprint of our Journal of Applied Physics (JAP) paper “Grain-size dependent demagnetizing factors in permanent magnets” has been updated since the old version was not being discovered by the Google Scholar crawler.

There is also now a version on arXiv. I hope that Google Scholar will now correctly index the paper so that it’s easier for people to find!

The full, correct reference for the paper is:

S. Bance, B. Seebacher, T. Schrefl, L. Exl, M. Winklhofer, G. Hrkac, G. Zimanyi, T. Shoji, M. Yano, N. Sakuma, M. Ito, A. Kato and A. Manabe,¬†“Grain-size dependent demagnetizing factors in permanent magnets”,¬†J. Appl. Phys. 116, 233903 (2014); http://dx.doi.org/10.1063/1.4904854

 




[Paper] Replacement and Original Magnet Engineering Options (ROMEOs): A European Seventh Framework Project to Develop Advanced Permanent Magnets Without, or with Reduced Use of, Critical Raw Materials

New paper

Figure from the paper.

Our latest paper has been published in JOM (Springer). It is a summary of the achievements (so far) of the ROMEO fp7 project.

ROMEO is a project that aims to research and develop novel microstructural-engineering strategies that will dramatically improve the properties of magnets based purely on light rare earth elements, especially the coercivity, which will enable them to be used for applications above 100 deg.

Abstract

The rare-earth crisis, which peaked in the summer of 2011 with the prices of both light and heavy rare earths soaring to unprecedented levels, brought about the widespread realization that the long-term availability and price stability of rare earths could not be guaranteed. This triggered a rapid response from manufacturers involved in rare earths, as well as governments and national and international funding agencies. In the case of rare-earth-containing permanent magnets, three possibilities were given quick and serious consideration: (I) increased recycling of devices containing rare earths; (II) the search for new, mineable, rare-earth resources beyond those in China; and (III) the development of high-energy-product permanent magnets with little or no rare-earth content used in their manufacture. The Replacement and Original Magnet Engineering Options (ROMEO) project addresses the latter challenge using a two-pronged approach. With its basis on work packages that include materials modeling and advanced characterization, the ROMEO project is an attempt to develop a new class of novel permanent magnets that are free of rare earths. Furthermore, the project aims to minimize rare-earth content, particularly heavy-rare-earth (HRE) content, as much as possible in Nd-Fe-B-type magnets. Success has been achieved on both fronts. In terms of new, rare-earth-free magnets, a Heusler alloy database of 236,945 compounds has been narrowed down to approximately 20 new compounds. Of these compounds, Co2MnTi is expected to be a ferromagnet with a high Curie temperature and a high magnetic moment. Regarding the reduction in the amount of rare earths, and more specifically HREs, major progress is seen in electrophoretic deposition as a method for accurately positioning the HRE on the surface prior to its diffusion into the microstructure. This locally increases the coercivity of the rather small Nd-Fe-B-type magnet, thereby substantially reducing the dependence on the HREs Dy and Tb, two of the most critical raw materials identified by the European Commission. Overall, the ROMEO project has demonstrated that rapid progress can be achieved when experts in a specific area are brought together to focus on a particular challenge. With more than half a year of the ROMEO project remaining, further progress and additional breakthroughs can be expected.

Reference

P. McGuiness, O. Akdogan, A. Asali, S. Bance, F. Bittner, J. M. D. Coey, N. M. Dempsey, J. Fidler, D. Givord, O. Gutfleisch, M. Katter, D. Le Roy, S. Sanvito, T. Schrefl, L. Schultz, C. Schw√∂bl, M. SoderŇĺnik, S. ҆turm, P. Tozman, K. √úst√ľner, M. Venkatesan, T. G. Woodcock, K. ŇĹagar, S. Kobe, ‚ÄúReplacement and Original Magnet Engineering Options (ROMEO): A European 7th Framework project to develop advanced permanent magnets without, or with reduced use of, critical raw materials‚ÄĚ, JOM (Springer), 24-4-2015, doi: 10.1007/s11837-015-1412-x