Parallelization in Python example with joblib

It can be ridiculously easy to parallelize code in Python. Check out the following simple example:

import time 
from joblib import Parallel, delayed 

# A function that can be called to do work:
def work(arg):    
    print "Function receives the arguments as a list:", arg
    # Split the list to individual variables:
    i, j = arg    
    # All this work function does is wait 1 second...
    time.sleep(1)    
    # ... and prints a string containing the inputs:
    print "%s_%s" % (i, j)
    return "%s_%s" % (i, j)

# List of arguments to pass to work():
arg_instances = [(1, 1), (1, 2), (1, 3), (1, 4)]
# Anything returned by work() can be stored:
results = Parallel(n_jobs=4, verbose=1, backend="threading")(map(delayed(work), arg_instances))
print results

Output:

Function receives the arguments as a list: (1, 1)
1_1
Function receives the arguments as a list: (1, 2)
1_2
Function receives the arguments as a list: (1, 3)
1_3
Function receives the arguments as a list: (1, 4)
1_4
['1_1', '1_2', '1_3', '1_4']
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    3.9s finished</pre>

As you can see, this simple program executed all four of our argument instances sequentially, because we chose n_jobs = 1, i.e. we told it to use 1 CPU, which means it runs in series. The total time to run is reported as approximately 4 sec (it is actually less than 4 sec, but we won’t concern ourselves with this here!).

Now, we run it again in parallel, but this time with n_jobs = 4:

Function receives the arguments as a list:Function receives the arguments as a list: Function receives the arguments as a list:  Function receives the arguments as a list: (1, 1)(1, 2) (1, 4)

(1, 3)

1_1
1_2
1_41_3

['1_1', '1_2', '1_3', '1_4']
[Parallel(n_jobs=4)]: Done   4 out of   4 | elapsed:    0.9s finished</pre>

As you can see, the internal print commands from all four jobs are being printed to screen more-or-less simultaneously, and not in the original order. Whatever thread finished first gets printed first! The time to finish is now around 1/4 the original time, as expected.