It can be ridiculously easy to parallelize code in Python. Check out the following simple example:
import time
from joblib import Parallel, delayed
# A function that can be called to do work:
def work(arg):
print "Function receives the arguments as a list:", arg
# Split the list to individual variables:
i, j = arg
# All this work function does is wait 1 second...
time.sleep(1)
# ... and prints a string containing the inputs:
print "%s_%s" % (i, j)
return "%s_%s" % (i, j)
# List of arguments to pass to work():
arg_instances = [(1, 1), (1, 2), (1, 3), (1, 4)]
# Anything returned by work() can be stored:
results = Parallel(n_jobs=4, verbose=1, backend="threading")(map(delayed(work), arg_instances))
print results
Output:
Function receives the arguments as a list: (1, 1)
1_1
Function receives the arguments as a list: (1, 2)
1_2
Function receives the arguments as a list: (1, 3)
1_3
Function receives the arguments as a list: (1, 4)
1_4
['1_1', '1_2', '1_3', '1_4']
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 3.9s finished</pre>
As you can see, this simple program executed all four of our argument instances sequentially, because we chose n_jobs = 1, i.e. we told it to use 1 CPU, which means it runs in series. The total time to run is reported as approximately 4 sec (it is actually less than 4 sec, but we won’t concern ourselves with this here!).
Now, we run it again in parallel, but this time with n_jobs = 4:
Function receives the arguments as a list:Function receives the arguments as a list: Function receives the arguments as a list: Function receives the arguments as a list: (1, 1)(1, 2) (1, 4)
(1, 3)
1_1
1_2
1_41_3
['1_1', '1_2', '1_3', '1_4']
[Parallel(n_jobs=4)]: Done 4 out of 4 | elapsed: 0.9s finished</pre>
As you can see, the internal print commands from all four jobs are being printed to screen more-or-less simultaneously, and not in the original order. Whatever thread finished first gets printed first! The time to finish is now around 1/4 the original time, as expected.